Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Cython and memviews creation #17299

Open
Open
@NicolasHug

Description

@NicolasHug

Not an issue, just a Cython-related PSA that we need to keep in mind when reviewing PRs:

We shouldn't create 1d views for each sample, this is slow:

cdef float X[:, :] = ...  # big 2d view
for i in range(n_samples):  # same with prange, same with or without the GIL
	f(X[i])

do this instead, or use pointers, at least for now:

for i in range(n_samples):
	f(X, i)  # and work on X[i, :] in f's code

This is valid for any pattern that generates lots of views so looping over features might not be a good idea either if we expect lots of features.
There might be a "fix" in cython/cython#2227 / cython/cython#3617

The reason is that there's a significant overhead when creating all these 1d views, which comes from Cython internal ref-counting (details at cython/cython#2987). In the hist-GBDT prediction code, this overhead amounts for more than 30% of the runtime so it's not negligible.

Note that:

  • Doing this with prange used to generate some additional Python interactions, but this was fixed in cython/cython@794d21d and backported to Cython 0.29
  • Now that no Python interactions are generated, we need to be extra careful with this because we won't even see it in Cython annotated files

CC @scikit-learn/core-devs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions