Cython and memviews creation

Not an issue, just a Cython-related PSA that we need to keep in mind when reviewing PRs:

We shouldn't create 1d views for each sample, this is slow:

```py
cdef float X[:, :] = ...  # big 2d view
for i in range(n_samples):  # same with prange, same with or without the GIL
	f(X[i])
```

do this instead, or use pointers, at least for now:

```py
for i in range(n_samples):
	f(X, i)  # and work on X[i, :] in f's code
```
This is valid for any pattern that generates lots of views so looping over features might not be a good idea either if we expect lots of features.
There might be a "fix" in https://github.com/cython/cython/issues/2227 / cython/cython#3617

The reason is that there's a significant overhead when creating all these 1d views, which comes from Cython internal ref-counting (details at https://github.com/cython/cython/issues/2987). In the hist-GBDT prediction code, this overhead amounts for more than 30% of the runtime so it's not negligible.

Note that:
- Doing this with `prange` used to generate some additional Python interactions, but this was fixed in https://github.com/cython/cython/commit/794d21d929a60c0ff9f1aa92fc79cc79c1d4753d and backported to Cython 0.29
- Now that no Python interactions are generated, we need to be extra careful with this because we won't even see it in Cython annotated files

CC @scikit-learn/core-devs 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Cython and memviews creation #17299

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Cython and memviews creation #17299

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions