Description
Not an issue, just a Cython-related PSA that we need to keep in mind when reviewing PRs:
We shouldn't create 1d views for each sample, this is slow:
cdef float X[:, :] = ... # big 2d view
for i in range(n_samples): # same with prange, same with or without the GIL
f(X[i])
do this instead, or use pointers, at least for now:
for i in range(n_samples):
f(X, i) # and work on X[i, :] in f's code
This is valid for any pattern that generates lots of views so looping over features might not be a good idea either if we expect lots of features.
There might be a "fix" in cython/cython#2227 / cython/cython#3617
The reason is that there's a significant overhead when creating all these 1d views, which comes from Cython internal ref-counting (details at cython/cython#2987). In the hist-GBDT prediction code, this overhead amounts for more than 30% of the runtime so it's not negligible.
Note that:
- Doing this with
prange
used to generate some additional Python interactions, but this was fixed in cython/cython@794d21d and backported to Cython 0.29 - Now that no Python interactions are generated, we need to be extra careful with this because we won't even see it in Cython annotated files
CC @scikit-learn/core-devs