-
-
Notifications
You must be signed in to change notification settings - Fork 11k
API: Make 64bit default integer on 64bit windows #24224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Some of the test failures are interesting :) Definitely needs a release note at some point. |
Fixes welcomem ;). Let's see, many are just direct adaptations to the change. Lot's of failures in EDIT: OK, failures are down to a very manageable amount (a lot was due to random tests taking the wrong branch). |
Except legacy Not changing the legacy results, means that some internal int64s are cast to long. To allow default integer input, we cannot safe-cast user input to
|
0503092
to
df23b87
Compare
I spent the last couple of hours trying to build SciPy from source on Windows by following the CI. Not being a Windows person - it's proving to be quite the challenge. I think this is due to my lack of experience on Windows, not due to your changes. Tomorrow I'll probably take another look and see if I can make some progress |
@bashtage can I snipe you to have a look at the (admittedly not polished) changes in random here? Dealing with the legacy random code seems like one of the more hairy points potentially. Maybe it's all fine (just a bit rough), but I am not sure about how we would want to do it here. |
@@ -3481,16 +3506,20 @@ cdef class RandomState: | |||
return randoms | |||
|
|||
_dp = PyFloat_AsDouble(p) | |||
_in = <long>n | |||
_in = int(n) | |||
check_constraint(_dp, 'p', CONS_BOUNDED_0_1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the a python function call rather than a C cast? Maybe Cython 3 does the right thing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem was that this can be a Python float, somehow the <long>
cast uses a long cast that allows floats, while other to integer casts don't. This does depend on the Cython version (it worked locally).
I suspect it's the opposite, it was necessary in Cython 3 to do something more than a cast.
Will try once more locally, but maybe the best path is to just extract the value from the array (which we have created here in either case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you just case twice? <long>(<int>n)
to avoid python?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cython translates this to checking for an exact Python integer (converting if not, but that is always a given). It then calls calls it's own conversion function to ssize_t
.
The only real difference is that additional long
check, no "Python" involved really.
ongood = <np.ndarray>np.PyArray_FROM_OTF(ngood, np.NPY_LONG, np.NPY_ALIGNED) | ||
onbad = <np.ndarray>np.PyArray_FROM_OTF(nbad, np.NPY_LONG, np.NPY_ALIGNED) | ||
onsample = <np.ndarray>np.PyArray_FROM_OTF(nsample, np.NPY_LONG, np.NPY_ALIGNED) | ||
ongood = <np.ndarray>np.PyArray_FROM_OTF(ngood, np.NPY_INT64, np.NPY_ALIGNED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit surprising in terms of the behavior on 32bit windows which I think should have a 32but interfer, if I understand the rule correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The internal machinery requires int64, and there was a cast to int64 below. The thing that this does relax, is that previous an int64 input would raise because it cannot be safely cast to long (int32) on windows and 32bits.
(I am happy to change it to intp though! Just seemed like might as well go to int64 directly.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that was the point of using NPY_LONG
here, according to the comment above it. To error out early rather than late.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rkern right, I should remove the comment. Ensuring an error is why I added the explicit check below. This ensures that:
- We still get an error when the value is out of bounds. (later, but not much)
- The user can pass a default-integer array. This would otherwise simply fail otherwise because it is not a safe cast.
So, the relaxation and explicit check seemed unfortunately necessary because I don't want to break default integer array input. Or am I misunderstanding your comment?
Hello! We only test nightly numpy on Linux. Is this ready for downstream testing? Actually, this PR has no wheel, so I am not sure how easy it is to build this from scratch and test locally on Windows. But |
Yes, I think the NumPy changes should be mostly settled so that would make me very happy! Just triggered the build, are the artifacts visible to you: https://github.com/numpy/numpy/actions/runs/5645884112#artifacts? just in case it works better windows-py10 and windows-py11. I do not know how badly this will affect astropy or your dependencies; I would think some cython code breaks because it uses |
Thanks! |
22b2e72
to
19dba2a
Compare
I have tried this on sklearn without modifying SciPy and besides the fix above things seem decent (there is one file that needs some work-arounds in sklearn). EDIT: Well, this isn't necessarily right, I didn't actually test on windows :). |
This adepts the `_random.pyx` file to return whatever is the NumPy default integer, which, on NumPy 2.0 would fix. Since the cython symbol wasn't used, I just removed it as it bites with the overloading. See numpy/numpy#24224 for the commit which would make this necessary. At the time this is a bit hard to test since the SciPy nightlies are incompatible with that NumPy branch. But I thought I would put it out there for discussion. The alternative and simpler solution might be to just force 64bit results on any 64bit system and not worry about the NumPy version.
This adepts the `_random.pyx` file to return whatever is the NumPy default integer, which, on NumPy 2.0 would fix. Since the cython symbol wasn't used, I just removed it as it bites with the overloading. See numpy/numpy#24224 for the commit which would make this necessary. At the time this is a bit hard to test since the SciPy nightlies are incompatible with that NumPy branch. But I thought I would put it out there for discussion. The alternative and simpler solution might be to just force 64bit results on any 64bit system and not worry about the NumPy version.
19dba2a
to
948d722
Compare
@mtsokol, I had added (Maybe rebasing should wait until the type alias refactor is done...) |
948d722
to
d28cd00
Compare
Thanks for updating @mtsokol! There are two issues remaining here, if I am not missing something big:
|
The typing sides of things is fortunately not too difficult: typing patchdiff --git a/numpy/__init__.pyi b/numpy/__init__.pyi
index 418bdf614..03b118fdf 100644
--- a/numpy/__init__.pyi
+++ b/numpy/__init__.pyi
@@ -2857,7 +2857,8 @@ def __init__(self, value: _IntValue = ..., /) -> None: ...
short = signedinteger[_NBitShort]
intc = signedinteger[_NBitIntC]
intp = signedinteger[_NBitIntP]
-int_ = signedinteger[_NBitInt]
+int_ = intp
+long = signedinteger[_NBitInt]
longlong = signedinteger[_NBitLongLong]
# TODO: `item`/`tolist` returns either `dt.timedelta` or `int`
@@ -2938,7 +2939,8 @@ def __init__(self, value: _IntValue = ..., /) -> None: ...
ushort = unsignedinteger[_NBitShort]
uintc = unsignedinteger[_NBitIntC]
uintp = unsignedinteger[_NBitIntP]
-uint = unsignedinteger[_NBitInt]
+uint = uintp
+ulong = unsignedinteger[_NBitInt]
ulonglong = unsignedinteger[_NBitLongLong]
class inexact(number[_NBit1]): # type: ignore
diff --git a/numpy/typing/mypy_plugin.py b/numpy/typing/mypy_plugin.py
index f4ad55341..78fea240f 100644
--- a/numpy/typing/mypy_plugin.py
+++ b/numpy/typing/mypy_plugin.py
@@ -59,7 +59,7 @@ def _get_precision_dict() -> dict[str, str]:
("_NBitShort", np.short),
("_NBitIntC", np.intc),
("_NBitIntP", np.intp),
- ("_NBitInt", np.int_),
+ ("_NBitInt", np.long),
("_NBitLongLong", np.longlong),
("_NBitHalf", np.half),
diff --git a/numpy/typing/tests/data/reveal/ctypeslib.pyi b/numpy/typing/tests/data/reveal/ctypeslib.pyi
index a9712c074..5c3b2138f 100644
--- a/numpy/typing/tests/data/reveal/ctypeslib.pyi
+++ b/numpy/typing/tests/data/reveal/ctypeslib.pyi
@@ -79,17 +79,9 @@
assert_type(np.ctypeslib.as_array(1), npt.NDArray[Any])
assert_type(np.ctypeslib.as_array(pointer), npt.NDArray[Any])
-if sys.platform == "win32":
- assert_type(np.ctypeslib.as_ctypes_type(np.int_), type[ct.c_int])
- assert_type(np.ctypeslib.as_ctypes_type(np.uint), type[ct.c_uint])
- assert_type(np.ctypeslib.as_ctypes(AR_uint), ct.Array[ct.c_uint])
- assert_type(np.ctypeslib.as_ctypes(AR_int), ct.Array[ct.c_int])
- assert_type(np.ctypeslib.as_ctypes(AR_uint.take(0)), ct.c_uint)
- assert_type(np.ctypeslib.as_ctypes(AR_int.take(0)), ct.c_int)
-else:
- assert_type(np.ctypeslib.as_ctypes_type(np.int_), type[ct.c_long])
- assert_type(np.ctypeslib.as_ctypes_type(np.uint), type[ct.c_ulong])
- assert_type(np.ctypeslib.as_ctypes(AR_uint), ct.Array[ct.c_ulong])
- assert_type(np.ctypeslib.as_ctypes(AR_int), ct.Array[ct.c_long])
- assert_type(np.ctypeslib.as_ctypes(AR_uint.take(0)), ct.c_ulong)
- assert_type(np.ctypeslib.as_ctypes(AR_int.take(0)), ct.c_long)
+assert_type(np.ctypeslib.as_ctypes_type(np.int_), type[ct.c_long])
+assert_type(np.ctypeslib.as_ctypes_type(np.uint), type[ct.c_ulong])
+assert_type(np.ctypeslib.as_ctypes(AR_uint), ct.Array[ct.c_ulong])
+assert_type(np.ctypeslib.as_ctypes(AR_int), ct.Array[ct.c_long])
+assert_type(np.ctypeslib.as_ctypes(AR_uint.take(0)), ct.c_ulong)
+assert_type(np.ctypeslib.as_ctypes(AR_int.take(0)), ct.c_long)
|
Co-authored-by: Nathan Goldbaum <[email protected]>
ee5e537
to
b518738
Compare
Simply sort the long codes before int, because it might mean we prefer them (and that may fix the tests). Also explicitly add mapping to ssize_t for int (this pre-empts chaning the definition of `intp` admittedly)
b518738
to
b3bf8c3
Compare
Puh, typing works now... to a large degree just a full circle of changes in the end. Typing changes here are now almost only a few small fix-ups. I did a few greps over the typing stubs, but I cannot be 100% sure that there isn't a stray "default integer" around. Although, it seems to me that typing things like array coercion is relatively limited, so use of So, I think this should be as ready as it gets, if the docs are fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs look good, just a couple typo fixes and comments. @BvB93 can you give the typing changes one more look?
are using the ``long`` or equivalent type on the C-side. | ||
In this case, you may wish to using ``intp`` and cast user input or support | ||
both ``long`` and ``intp`` (to better support NumPy 1.x as well). | ||
When creating a new integer array in C or Cython, the new ``NPY_DEFAULT_INT`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should define this in 1.26 too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might make it a bit easier to use for downstream, but they still will need to force numpy>=1.26
at compile time. So I am tempted to say that unfortunately, it is better to vendor it:
#ifndef NPY_DEFAULT_INT
#define NPY_DEFAULT_INT NPY_LONG
#endif
(unless you force >2.0
at compile time)?
Happy to follow-up with a backport, not sure it's helpful, but wouldn't hurt.
Co-authored-by: Nathan Goldbaum <[email protected]>
# Mainly on windows int is the same size as long but gets picked first: | ||
assert_type(np.ctypeslib.as_ctypes_type(np.long), type[ct.c_int]) | ||
assert_type(np.ctypeslib.as_ctypes_type(np.ulong), type[ct.c_uint]) | ||
assert_type(np.ctypeslib.as_ctypes(AR_ulong), ct.Array[ct.c_uint]) | ||
assert_type(np.ctypeslib.as_ctypes(AR_long), ct.Array[ct.c_int]) | ||
assert_type(np.ctypeslib.as_ctypes(AR_long.take(0)), ct.c_int) | ||
assert_type(np.ctypeslib.as_ctypes(AR_ulong.take(0)), ct.c_uint) | ||
else: | ||
assert_type(np.ctypeslib.as_ctypes_type(np.int_), type[ct.c_long]) | ||
assert_type(np.ctypeslib.as_ctypes_type(np.uint), type[ct.c_ulong]) | ||
assert_type(np.ctypeslib.as_ctypes(AR_uint), ct.Array[ct.c_ulong]) | ||
assert_type(np.ctypeslib.as_ctypes(AR_int), ct.Array[ct.c_long]) | ||
assert_type(np.ctypeslib.as_ctypes(AR_uint.take(0)), ct.c_ulong) | ||
assert_type(np.ctypeslib.as_ctypes(AR_int.take(0)), ct.c_long) | ||
assert_type(np.ctypeslib.as_ctypes_type(np.long), type[ct.c_long]) | ||
assert_type(np.ctypeslib.as_ctypes_type(np.ulong), type[ct.c_ulong]) | ||
assert_type(np.ctypeslib.as_ctypes(AR_ulong), ct.Array[ct.c_ulong]) | ||
assert_type(np.ctypeslib.as_ctypes(AR_long), ct.Array[ct.c_long]) | ||
assert_type(np.ctypeslib.as_ctypes(AR_long.take(0)), ct.c_long) | ||
assert_type(np.ctypeslib.as_ctypes(AR_ulong.take(0)), ct.c_ulong) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM here; nice to see you managed to get things working without commenting them out!
Let's pull this in. Thanks for pushing this forward @seberg! If you're looking at this PR because of new issues on Windows in CI or elsewhere, take a look at the NumPy 2.0 Migration Guide, which will be updated with specific information related to this change shortly after this PR is merged. In short, existing code that assumed the default integer type on windows is 32 bit may behave differently or raise new errors. If that document doesn't answer your questions or if you are still confused, please feel free to open an issue describing your problem. |
This adepts the `_random.pyx` file to return whatever is the NumPy default integer, which, on NumPy 2.0 would fix. Since the cython symbol wasn't used, I just removed it as it bites with the overloading. See numpy/numpy#24224 for the commit which would make this necessary. At the time this is a bit hard to test since the SciPy nightlies are incompatible with that NumPy branch. But I thought I would put it out there for discussion. The alternative and simpler solution might be to just force 64bit results on any 64bit system and not worry about the NumPy version.
This adepts the `_random.pyx` file to return whatever is the NumPy default integer, which, on NumPy 2.0 would fix. Since the cython symbol wasn't used, I just removed it as it bites with the overloading. See numpy/numpy#24224 for the commit which would make this necessary. At the time this is a bit hard to test since the SciPy nightlies are incompatible with that NumPy branch. But I thought I would put it out there for discussion. The alternative and simpler solution might be to just force 64bit results on any 64bit system and not worry about the NumPy version.
This adepts the `_random.pyx` file to return whatever is the NumPy default integer, which, on NumPy 2.0 would fix. Since the cython symbol wasn't used, I just removed it as it bites with the overloading. See numpy/numpy#24224 for the commit which would make this necessary. At the time this is a bit hard to test since the SciPy nightlies are incompatible with that NumPy branch. But I thought I would put it out there for discussion. The alternative and simpler solution might be to just force 64bit results on any 64bit system and not worry about the NumPy version.
This is a draft to get the ball rolling. @Kai-Striega was hoping to probe this a bit more in downstream packages, but anyone excited is very much invited to join in and test downstream or
start fixing tests (all green now)look at the changes, especially the random ones)User Facing changes:
This changes the our default integer to
intp
, which really only changes behavior on 64bit windows. This also means:np.int_
andnp.uint
are now aliases tointp
effectively (maybe we want to simplify this in some follow-up.np.long
andnp.ulong
now existPy_ssize_t
, it isn't relevant except on a single very niche platform. Also, I kinda want to just change it in C...)The legacy random API is untouched here (i.e. would continue using the old
long
), many many tests will fail due to this on windows and there are probably still holes.Of course, and some ufunc loops need adapting (use int64), this isn't user-facing though.
API changes for C-API/Cython users
Cython users may get a small problem, but they are also the ones complaining the most :). We have to remove
np.int_t
because it cannot be defined for both 1.x and 2.x at the same time.Both in Cython and in C there is a new
NPY_DEFAULT_INT
type-code (although I suspect cython users have easier paths). They may have to use a fused-type (themselves) or simply add a manual casts.Other changes:
NPY_INTP
now prefers mapping to long rather than int (if same). I think this makes a lot more sense (we always prefer long), but is a small API change/incompatibility (runtime type number of intp may not match the compile time one on NumPy 1.x when compiling with/for NumPy 2.x). Mainly: Should be mentioned in release note