Thanks to visit codestin.com
Credit goes to github.com

Skip to content

API: Make 64bit default integer on 64bit windows #24224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 41 commits into from
Nov 2, 2023

Conversation

seberg
Copy link
Member

@seberg seberg commented Jul 20, 2023

This is a draft to get the ball rolling. @Kai-Striega was hoping to probe this a bit more in downstream packages, but anyone excited is very much invited to join in and test downstream or start fixing tests (all green now) look at the changes, especially the random ones)

User Facing changes:

This changes the our default integer to intp, which really only changes behavior on 64bit windows. This also means:

  • np.int_ and np.uint are now aliases to intp effectively (maybe we want to simplify this in some follow-up.
  • np.long and np.ulong now exist
  • (I am accepting intp is not ideal and may itself be better as Py_ssize_t, it isn't relevant except on a single very niche platform. Also, I kinda want to just change it in C...)

The legacy random API is untouched here (i.e. would continue using the old long), many many tests will fail due to this on windows and there are probably still holes.

Of course, and some ufunc loops need adapting (use int64), this isn't user-facing though.

API changes for C-API/Cython users

Cython users may get a small problem, but they are also the ones complaining the most :). We have to remove np.int_t because it cannot be defined for both 1.x and 2.x at the same time.
Both in Cython and in C there is a new NPY_DEFAULT_INT type-code (although I suspect cython users have easier paths). They may have to use a fused-type (themselves) or simply add a manual casts.

Other changes:

  • NPY_INTP now prefers mapping to long rather than int (if same). I think this makes a lot more sense (we always prefer long), but is a small API change/incompatibility (runtime type number of intp may not match the compile time one on NumPy 1.x when compiling with/for NumPy 2.x). Mainly: Should be mentioned in release note
  • There is a new constant added next to the API table to fetch the NumPy C-version from our macros. I think this is fine, NumPy 1.x would require a function call otherwise.

@charris
Copy link
Member

charris commented Jul 20, 2023

Some of the test failures are interesting :) Definitely needs a release note at some point.

@seberg
Copy link
Member Author

seberg commented Jul 21, 2023

Fixes welcomem ;). Let's see, many are just direct adaptations to the change. Lot's of failures in random.binomial, but hopefully that is fixed now (certainly will be more in random).

EDIT: OK, failures are down to a very manageable amount (a lot was due to random tests taking the wrong branch).

@seberg
Copy link
Member Author

seberg commented Jul 21, 2023

Except legacy np.random, things should be settling (most things are fine!). Some of the random legacy are probably fixed a bit incorrectly right now, the random changes are really the trickest part here!

Not changing the legacy results, means that some internal int64s are cast to long. To allow default integer input, we cannot safe-cast user input to long anymore. We will need to cast it to int64 and probably but check that the values fit into long.
(Otherwise, we force users to ensure inputs are long, which seems like too much potential churn.)

EDIT: Hmmm wasm (maybe more 32bit platforms?) still need some tweaks, not sure what the problem is. Wrong typecode of the default integer? (Yes, that was the problem it seems. intp should be long if int and long have the same size!)

@seberg seberg force-pushed the intp-default-int branch 2 times, most recently from 0503092 to df23b87 Compare July 21, 2023 17:28
@Kai-Striega
Copy link
Member

I spent the last couple of hours trying to build SciPy from source on Windows by following the CI. Not being a Windows person - it's proving to be quite the challenge. I think this is due to my lack of experience on Windows, not due to your changes. Tomorrow I'll probably take another look and see if I can make some progress

@seberg
Copy link
Member Author

seberg commented Jul 24, 2023

@bashtage can I snipe you to have a look at the (admittedly not polished) changes in random here? Dealing with the legacy random code seems like one of the more hairy points potentially. Maybe it's all fine (just a bit rough), but I am not sure about how we would want to do it here.

@@ -3481,16 +3506,20 @@ cdef class RandomState:
return randoms

_dp = PyFloat_AsDouble(p)
_in = <long>n
_in = int(n)
check_constraint(_dp, 'p', CONS_BOUNDED_0_1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the a python function call rather than a C cast? Maybe Cython 3 does the right thing here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem was that this can be a Python float, somehow the <long> cast uses a long cast that allows floats, while other to integer casts don't. This does depend on the Cython version (it worked locally).

I suspect it's the opposite, it was necessary in Cython 3 to do something more than a cast.

Will try once more locally, but maybe the best path is to just extract the value from the array (which we have created here in either case).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you just case twice? <long>(<int>n) to avoid python?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cython translates this to checking for an exact Python integer (converting if not, but that is always a given). It then calls calls it's own conversion function to ssize_t.
The only real difference is that additional long check, no "Python" involved really.

ongood = <np.ndarray>np.PyArray_FROM_OTF(ngood, np.NPY_LONG, np.NPY_ALIGNED)
onbad = <np.ndarray>np.PyArray_FROM_OTF(nbad, np.NPY_LONG, np.NPY_ALIGNED)
onsample = <np.ndarray>np.PyArray_FROM_OTF(nsample, np.NPY_LONG, np.NPY_ALIGNED)
ongood = <np.ndarray>np.PyArray_FROM_OTF(ngood, np.NPY_INT64, np.NPY_ALIGNED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit surprising in terms of the behavior on 32bit windows which I think should have a 32but interfer, if I understand the rule correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal machinery requires int64, and there was a cast to int64 below. The thing that this does relax, is that previous an int64 input would raise because it cannot be safely cast to long (int32) on windows and 32bits.

(I am happy to change it to intp though! Just seemed like might as well go to int64 directly.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that was the point of using NPY_LONG here, according to the comment above it. To error out early rather than late.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkern right, I should remove the comment. Ensuring an error is why I added the explicit check below. This ensures that:

  1. We still get an error when the value is out of bounds. (later, but not much)
  2. The user can pass a default-integer array. This would otherwise simply fail otherwise because it is not a safe cast.

So, the relaxation and explicit check seemed unfortunately necessary because I don't want to break default integer array input. Or am I misunderstanding your comment?

@pllim
Copy link
Contributor

pllim commented Jul 24, 2023

Hello! We only test nightly numpy on Linux. Is this ready for downstream testing? Actually, this PR has no wheel, so I am not sure how easy it is to build this from scratch and test locally on Windows. But astropy would be interested to test this at some point.

@seberg seberg added the 36 - Build Build related PR label Jul 24, 2023
@seberg
Copy link
Member Author

seberg commented Jul 24, 2023

Is this ready for downstream testing?

Yes, I think the NumPy changes should be mostly settled so that would make me very happy!

Just triggered the build, are the artifacts visible to you: https://github.com/numpy/numpy/actions/runs/5645884112#artifacts? just in case it works better windows-py10 and windows-py11.

I do not know how badly this will affect astropy or your dependencies; I would think some cython code breaks because it uses npc.int_[::1] or long[::1] memoryviews expecting/knowing the default integer input should come in. (Which is of course the exact opposite that many small libraries fight with, who hard-code int64, presuming that everyone uses that, but win64 users are left unable to use their libs).

@pllim
Copy link
Contributor

pllim commented Jul 24, 2023

Thanks!

@seberg
Copy link
Member Author

seberg commented Aug 9, 2023

I have tried this on sklearn without modifying SciPy and besides the fix above things seem decent (there is one file that needs some work-arounds in sklearn).

EDIT: Well, this isn't necessarily right, I didn't actually test on windows :).

seberg added a commit to seberg/scikit-learn that referenced this pull request Aug 9, 2023
This adepts the `_random.pyx` file to return whatever is the NumPy
default integer, which, on NumPy 2.0 would fix.
Since the cython symbol wasn't used, I just removed it as it bites
with the overloading.

See numpy/numpy#24224 for the commit which
would make this necessary.

At the time this is a bit hard to test since the SciPy nightlies are
incompatible with that NumPy branch.  But I thought I would put it out
there for discussion.

The alternative and simpler solution might be to just force 64bit
results on any 64bit system and not worry about the NumPy version.
seberg added a commit to seberg/scikit-learn that referenced this pull request Aug 9, 2023
This adepts the `_random.pyx` file to return whatever is the NumPy
default integer, which, on NumPy 2.0 would fix.
Since the cython symbol wasn't used, I just removed it as it bites
with the overloading.

See numpy/numpy#24224 for the commit which
would make this necessary.

At the time this is a bit hard to test since the SciPy nightlies are
incompatible with that NumPy branch.  But I thought I would put it out
there for discussion.

The alternative and simpler solution might be to just force 64bit
results on any 64bit system and not worry about the NumPy version.
@seberg seberg force-pushed the intp-default-int branch from 19dba2a to 948d722 Compare August 9, 2023 12:49
@seberg
Copy link
Member Author

seberg commented Sep 7, 2023

@mtsokol, I had added long and ulong here, because we effectively used int_ for that before and that is a worse name. I can live with requiring np.dtype("long").type also, mainly there is quite a bit of merge conflict in the parts related to that addition and I was wondering if things changed a bit w.r.t. to adding them.

(Maybe rebasing should wait until the type alias refactor is done...)

@seberg
Copy link
Member Author

seberg commented Oct 10, 2023

Thanks for updating @mtsokol! There are two issues remaining here, if I am not missing something big:

  1. It would be nice to explicitly vet the choices in mtrand.pyx. We can't change the default, but how exactly to do it with dtype=int, etc. is not clear.
  2. We need to add typing for the new long (and fix a few typing tests). Maybe @BvB93 can have a look. We may look into splitting out adding np.long here for simplicity though. (Adding np.long itself is simple, but I am not sure how involved the typing is.)

@BvB93
Copy link
Member

BvB93 commented Oct 10, 2023

We need to add typing for the new long (and fix a few typing tests). Maybe @BvB93 can have a look. We may look into splitting out adding np.long here for simplicity though. (Adding np.long itself is simple, but I am not sure how involved the typing is.)

The typing sides of things is fortunately not too difficult: (u)int just becomes a (u)intp alias, and (u)long reuses the old (u)int type alias (+ a handful a np.ctypeslib-related test fixes). Got a small patch down below, though I can also directly push to your branch of you prefer.

typing patch
diff --git a/numpy/__init__.pyi b/numpy/__init__.pyi
index 418bdf614..03b118fdf 100644
--- a/numpy/__init__.pyi
+++ b/numpy/__init__.pyi
@@ -2857,7 +2857,8 @@ def __init__(self, value: _IntValue = ..., /) -> None: ...
 short = signedinteger[_NBitShort]
 intc = signedinteger[_NBitIntC]
 intp = signedinteger[_NBitIntP]
-int_ = signedinteger[_NBitInt]
+int_ = intp
+long = signedinteger[_NBitInt]
 longlong = signedinteger[_NBitLongLong]
 
 # TODO: `item`/`tolist` returns either `dt.timedelta` or `int`
@@ -2938,7 +2939,8 @@ def __init__(self, value: _IntValue = ..., /) -> None: ...
 ushort = unsignedinteger[_NBitShort]
 uintc = unsignedinteger[_NBitIntC]
 uintp = unsignedinteger[_NBitIntP]
-uint = unsignedinteger[_NBitInt]
+uint = uintp
+ulong = unsignedinteger[_NBitInt]
 ulonglong = unsignedinteger[_NBitLongLong]
 
 class inexact(number[_NBit1]):  # type: ignore
diff --git a/numpy/typing/mypy_plugin.py b/numpy/typing/mypy_plugin.py
index f4ad55341..78fea240f 100644
--- a/numpy/typing/mypy_plugin.py
+++ b/numpy/typing/mypy_plugin.py
@@ -59,7 +59,7 @@ def _get_precision_dict() -> dict[str, str]:
         ("_NBitShort", np.short),
         ("_NBitIntC", np.intc),
         ("_NBitIntP", np.intp),
-        ("_NBitInt", np.int_),
+        ("_NBitInt", np.long),
         ("_NBitLongLong", np.longlong),
 
         ("_NBitHalf", np.half),
diff --git a/numpy/typing/tests/data/reveal/ctypeslib.pyi b/numpy/typing/tests/data/reveal/ctypeslib.pyi
index a9712c074..5c3b2138f 100644
--- a/numpy/typing/tests/data/reveal/ctypeslib.pyi
+++ b/numpy/typing/tests/data/reveal/ctypeslib.pyi
@@ -79,17 +79,9 @@
 assert_type(np.ctypeslib.as_array(1), npt.NDArray[Any])
 assert_type(np.ctypeslib.as_array(pointer), npt.NDArray[Any])
 
-if sys.platform == "win32":
-    assert_type(np.ctypeslib.as_ctypes_type(np.int_), type[ct.c_int])
-    assert_type(np.ctypeslib.as_ctypes_type(np.uint), type[ct.c_uint])
-    assert_type(np.ctypeslib.as_ctypes(AR_uint), ct.Array[ct.c_uint])
-    assert_type(np.ctypeslib.as_ctypes(AR_int), ct.Array[ct.c_int])
-    assert_type(np.ctypeslib.as_ctypes(AR_uint.take(0)), ct.c_uint)
-    assert_type(np.ctypeslib.as_ctypes(AR_int.take(0)), ct.c_int)
-else:
-    assert_type(np.ctypeslib.as_ctypes_type(np.int_), type[ct.c_long])
-    assert_type(np.ctypeslib.as_ctypes_type(np.uint), type[ct.c_ulong])
-    assert_type(np.ctypeslib.as_ctypes(AR_uint), ct.Array[ct.c_ulong])
-    assert_type(np.ctypeslib.as_ctypes(AR_int), ct.Array[ct.c_long])
-    assert_type(np.ctypeslib.as_ctypes(AR_uint.take(0)), ct.c_ulong)
-    assert_type(np.ctypeslib.as_ctypes(AR_int.take(0)), ct.c_long)
+assert_type(np.ctypeslib.as_ctypes_type(np.int_), type[ct.c_long])
+assert_type(np.ctypeslib.as_ctypes_type(np.uint), type[ct.c_ulong])
+assert_type(np.ctypeslib.as_ctypes(AR_uint), ct.Array[ct.c_ulong])
+assert_type(np.ctypeslib.as_ctypes(AR_int), ct.Array[ct.c_long])
+assert_type(np.ctypeslib.as_ctypes(AR_uint.take(0)), ct.c_ulong)
+assert_type(np.ctypeslib.as_ctypes(AR_int.take(0)), ct.c_long)

@seberg seberg force-pushed the intp-default-int branch 7 times, most recently from ee5e537 to b518738 Compare November 1, 2023 17:06
Simply sort the long codes before int, because it might mean we
prefer them (and that may fix the tests).  Also explicitly add
mapping to ssize_t for int (this pre-empts chaning the definition
of `intp` admittedly)
@seberg
Copy link
Member Author

seberg commented Nov 1, 2023

Puh, typing works now... to a large degree just a full circle of changes in the end. Typing changes here are now almost only a few small fix-ups. I did a few greps over the typing stubs, but I cannot be 100% sure that there isn't a stray "default integer" around. Although, it seems to me that typing things like array coercion is relatively limited, so use of int or type[int] is also.

So, I think this should be as ready as it gets, if the docs are fine.

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs look good, just a couple typo fixes and comments. @BvB93 can you give the typing changes one more look?

are using the ``long`` or equivalent type on the C-side.
In this case, you may wish to using ``intp`` and cast user input or support
both ``long`` and ``intp`` (to better support NumPy 1.x as well).
When creating a new integer array in C or Cython, the new ``NPY_DEFAULT_INT``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should define this in 1.26 too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might make it a bit easier to use for downstream, but they still will need to force numpy>=1.26 at compile time. So I am tempted to say that unfortunately, it is better to vendor it:

#ifndef NPY_DEFAULT_INT
#define NPY_DEFAULT_INT NPY_LONG
#endif

(unless you force >2.0 at compile time)?

Happy to follow-up with a backport, not sure it's helpful, but wouldn't hurt.

Comment on lines +83 to +96
# Mainly on windows int is the same size as long but gets picked first:
assert_type(np.ctypeslib.as_ctypes_type(np.long), type[ct.c_int])
assert_type(np.ctypeslib.as_ctypes_type(np.ulong), type[ct.c_uint])
assert_type(np.ctypeslib.as_ctypes(AR_ulong), ct.Array[ct.c_uint])
assert_type(np.ctypeslib.as_ctypes(AR_long), ct.Array[ct.c_int])
assert_type(np.ctypeslib.as_ctypes(AR_long.take(0)), ct.c_int)
assert_type(np.ctypeslib.as_ctypes(AR_ulong.take(0)), ct.c_uint)
else:
assert_type(np.ctypeslib.as_ctypes_type(np.int_), type[ct.c_long])
assert_type(np.ctypeslib.as_ctypes_type(np.uint), type[ct.c_ulong])
assert_type(np.ctypeslib.as_ctypes(AR_uint), ct.Array[ct.c_ulong])
assert_type(np.ctypeslib.as_ctypes(AR_int), ct.Array[ct.c_long])
assert_type(np.ctypeslib.as_ctypes(AR_uint.take(0)), ct.c_ulong)
assert_type(np.ctypeslib.as_ctypes(AR_int.take(0)), ct.c_long)
assert_type(np.ctypeslib.as_ctypes_type(np.long), type[ct.c_long])
assert_type(np.ctypeslib.as_ctypes_type(np.ulong), type[ct.c_ulong])
assert_type(np.ctypeslib.as_ctypes(AR_ulong), ct.Array[ct.c_ulong])
assert_type(np.ctypeslib.as_ctypes(AR_long), ct.Array[ct.c_long])
assert_type(np.ctypeslib.as_ctypes(AR_long.take(0)), ct.c_long)
assert_type(np.ctypeslib.as_ctypes(AR_ulong.take(0)), ct.c_ulong)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM here; nice to see you managed to get things working without commenting them out!

@ngoldbaum
Copy link
Member

Let's pull this in. Thanks for pushing this forward @seberg!

If you're looking at this PR because of new issues on Windows in CI or elsewhere, take a look at the NumPy 2.0 Migration Guide, which will be updated with specific information related to this change shortly after this PR is merged. In short, existing code that assumed the default integer type on windows is 32 bit may behave differently or raise new errors.

If that document doesn't answer your questions or if you are still confused, please feel free to open an issue describing your problem.

@ngoldbaum ngoldbaum merged commit 439762c into numpy:main Nov 2, 2023
@seberg seberg deleted the intp-default-int branch November 2, 2023 16:12
seberg added a commit to seberg/scikit-learn that referenced this pull request Nov 2, 2023
This adepts the `_random.pyx` file to return whatever is the NumPy
default integer, which, on NumPy 2.0 would fix.
Since the cython symbol wasn't used, I just removed it as it bites
with the overloading.

See numpy/numpy#24224 for the commit which
would make this necessary.

At the time this is a bit hard to test since the SciPy nightlies are
incompatible with that NumPy branch.  But I thought I would put it out
there for discussion.

The alternative and simpler solution might be to just force 64bit
results on any 64bit system and not worry about the NumPy version.
seberg added a commit to seberg/scikit-learn that referenced this pull request Nov 2, 2023
This adepts the `_random.pyx` file to return whatever is the NumPy
default integer, which, on NumPy 2.0 would fix.
Since the cython symbol wasn't used, I just removed it as it bites
with the overloading.

See numpy/numpy#24224 for the commit which
would make this necessary.

At the time this is a bit hard to test since the SciPy nightlies are
incompatible with that NumPy branch.  But I thought I would put it out
there for discussion.

The alternative and simpler solution might be to just force 64bit
results on any 64bit system and not worry about the NumPy version.
seberg added a commit to seberg/scikit-learn that referenced this pull request Nov 6, 2023
This adepts the `_random.pyx` file to return whatever is the NumPy
default integer, which, on NumPy 2.0 would fix.
Since the cython symbol wasn't used, I just removed it as it bites
with the overloading.

See numpy/numpy#24224 for the commit which
would make this necessary.

At the time this is a bit hard to test since the SciPy nightlies are
incompatible with that NumPy branch.  But I thought I would put it out
there for discussion.

The alternative and simpler solution might be to just force 64bit
results on any 64bit system and not worry about the NumPy version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
30 - API 36 - Build Build related PR
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

10 participants