-
-
Notifications
You must be signed in to change notification settings - Fork 26k
MAINT: Adapt sklearn for NumPy default integer change #27041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
215ca52
to
2e9b6d4
Compare
cc @Micky774 maybe? |
Close/reopen to trigger CI. This is now becoming relevant since NumPy main has the change in place allowing to test against the change. Since I guess we are not testing on windows+numpy nightly, there will be a chance that this is not complete! EDIT: To be clear, I would be very happy if anyone takes over and just uses this for inspiration. I don't have a windows computer very ready to try this on windows! |
Marking as ready for review. This might not be all there is to it, the main reason why I had it as WIP, was that it needs to be tested against a NumPy 2.0 nightly. Testing with NumPy 2.0 nightly on windows may then well flush out additional issues. |
For this it seems like a new CI entry for Windows pip-pre that installs from scientific-python-nightly-wheels would be the way to go (plus maybe a build with |
Right, let's see if I can just hack it into the windows job here. If there is worry after it succeeds (and is again reverted it here), one could probably add a nightly run for windows also, I think. (not sure how the infra works to create the issue) |
12bd61f
to
d895739
Compare
I will leave the hacks in place, in case someone wants to have a look. But the failures are unrelated:
So, while it could be that there are other places where you now use |
This adepts the `_random.pyx` file to return whatever is the NumPy default integer, which, on NumPy 2.0 would fix. Since the cython symbol wasn't used, I just removed it as it bites with the overloading. See numpy/numpy#24224 for the commit which would make this necessary. At the time this is a bit hard to test since the SciPy nightlies are incompatible with that NumPy branch. But I thought I would put it out there for discussion. The alternative and simpler solution might be to just force 64bit results on any 64bit system and not worry about the NumPy version.
OK, I just rebased away the CI changes to run against NumPy main branch on windows. So this is OK to merge unless there is code that should be changed. As mentioned, both branches are being taken in CI, because the branches are used on old numpy versions for linux vs. windows (the difference is windows switching which branch is taken on new NumPy). Windows test results against NumPy main in details block: https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=60607&view=logs&jobId=0238e32a-2fbb-5be1-f782-cfff4ef2924e&j=0238e32a-2fbb-5be1-f782-cfff4ef2924e&t=7a1155ea-f171-542f-2ca6-f7f6ff076f10
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR!
@@ -16,10 +16,6 @@ cdef enum: | |||
# 32-bit signed integers (i.e. 2^31 - 1). | |||
RAND_R_MAX = 2147483647 | |||
|
|||
cpdef sample_without_replacement(cnp.int_t n_population, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would make sample_without_replacement
not cimportable by other modules. Given that _random
is a private module, I am okay with it's removal.
sklearn/utils/_random.pyx
Outdated
# converting to long will allow conversion to integer (from float) | ||
# via `int()`, but conversion from other objects does not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little bit of rewording from reading the code.
# converting to long will allow conversion to integer (from float) | |
# via `int()`, but conversion from other objects does not. | |
# converting to long via `int()` will allow conversion to | |
# `intp`, but conversion from other objects does not. |
Is this consistent with what you intent to say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to tweak this and the other comments, maybe it helps. The point is the second branch does int()
implicitly and is the old behavior. Cython doesn't do it implicitly in the first, so we do it explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out, this was a lot more confusing than I thought. Because the definitions changed in the Cython 3 definitions file, which was the actual cause for changing how the conversion happened.
I am changing it back in NumPy 1.26.2 (and a slightly more thorough fix for 2.0). So this will not be strictly necessary in the future (as the comment says).
(I actually prefer the strict version, but that is a separate discussion/transition.)
In either case, I hope the comment is relatively clear now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @seberg!
Co-authored-by: Olivier Grisel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This adepts the
_random.pyx
file to return whatever is the NumPy default integer, which, on NumPy 2.0 would fix.Since the cython symbol wasn't used, I just removed it as it bites with the overloading.
See numpy/numpy#24224 for the commit which would make this necessary.
At the time this is a bit hard to test since the SciPy nightlies are incompatible with that NumPy branch. But I thought I would put it out there for discussion.
The alternative and simpler solution might be to just force 64bit results on any 64bit system and not worry about the NumPy version.
The interesting part here will be windows testing, but that is a bit held up on scipy nightlies upload. Although, if anyone does windows development and would try this that would be cool!
I.e. this is a draft, but I am hope I can hack in windows testing later.