API: Make 64bit default integer on 64bit windows #24224

seberg · 2023-07-20T19:02:14Z

This is a draft to get the ball rolling. @Kai-Striega was hoping to probe this a bit more in downstream packages, but anyone excited is very much invited to join in and test downstream or ~~start fixing tests (all green now)~~ look at the changes, especially the random ones)

User Facing changes:

This changes the our default integer to intp, which really only changes behavior on 64bit windows. This also means:

np.int_ and np.uint are now aliases to intp effectively (maybe we want to simplify this in some follow-up.
np.long and np.ulong now exist
(I am accepting intp is not ideal and may itself be better as Py_ssize_t, it isn't relevant except on a single very niche platform. Also, I kinda want to just change it in C...)

The legacy random API is untouched here (i.e. would continue using the old long), many many tests will fail due to this on windows and there are probably still holes.

Of course, and some ufunc loops need adapting (use int64), this isn't user-facing though.

API changes for C-API/Cython users

Cython users may get a small problem, but they are also the ones complaining the most :). We have to remove np.int_t because it cannot be defined for both 1.x and 2.x at the same time.
Both in Cython and in C there is a new NPY_DEFAULT_INT type-code (although I suspect cython users have easier paths). They may have to use a fused-type (themselves) or simply add a manual casts.

Other changes:

NPY_INTP now prefers mapping to long rather than int (if same). I think this makes a lot more sense (we always prefer long), but is a small API change/incompatibility (runtime type number of intp may not match the compile time one on NumPy 1.x when compiling with/for NumPy 2.x). Mainly: Should be mentioned in release note
There is a new constant added next to the API table to fetch the NumPy C-version from our macros. I think this is fine, NumPy 1.x would require a function call otherwise.

charris · 2023-07-20T19:57:59Z

Some of the test failures are interesting :) Definitely needs a release note at some point.

seberg · 2023-07-21T05:46:44Z

Fixes welcomem ;). Let's see, many are just direct adaptations to the change. Lot's of failures in random.binomial, but hopefully that is fixed now (certainly will be more in random).

EDIT: OK, failures are down to a very manageable amount (a lot was due to random tests taking the wrong branch).

seberg · 2023-07-21T09:12:35Z

Except legacy np.random, things should be settling (most things are fine!). Some of the random legacy are probably fixed a bit incorrectly right now, the random changes are really the trickest part here!

Not changing the legacy results, means that some internal int64s are cast to long. To allow default integer input, we cannot safe-cast user input to long anymore. We will need to cast it to int64 and probably but check that the values fit into long.
(Otherwise, we force users to ensure inputs are long, which seems like too much potential churn.)

EDIT: Hmmm wasm (maybe more 32bit platforms?) still need some tweaks, not sure what the problem is. Wrong typecode of the default integer? (Yes, that was the problem it seems. intp should be long if int and long have the same size!)

Kai-Striega · 2023-07-23T05:46:43Z

I spent the last couple of hours trying to build SciPy from source on Windows by following the CI. Not being a Windows person - it's proving to be quite the challenge. I think this is due to my lack of experience on Windows, not due to your changes. Tomorrow I'll probably take another look and see if I can make some progress

seberg · 2023-07-24T11:18:59Z

@bashtage can I snipe you to have a look at the (admittedly not polished) changes in random here? Dealing with the legacy random code seems like one of the more hairy points potentially. Maybe it's all fine (just a bit rough), but I am not sure about how we would want to do it here.

bashtage · 2023-07-24T11:25:42Z

numpy/random/mtrand.pyx

@@ -3481,16 +3506,20 @@ cdef class RandomState:
            return randoms

        _dp = PyFloat_AsDouble(p)
-        _in = <long>n
+        _in = int(n)
        check_constraint(_dp, 'p', CONS_BOUNDED_0_1)


Is the a python function call rather than a C cast? Maybe Cython 3 does the right thing here?

The problem was that this can be a Python float, somehow the <long> cast uses a long cast that allows floats, while other to integer casts don't. This does depend on the Cython version (it worked locally).

I suspect it's the opposite, it was necessary in Cython 3 to do something more than a cast.

Will try once more locally, but maybe the best path is to just extract the value from the array (which we have created here in either case).

Could you just case twice? <long>(<int>n) to avoid python?

Cython translates this to checking for an exact Python integer (converting if not, but that is always a given). It then calls calls it's own conversion function to ssize_t.
The only real difference is that additional long check, no "Python" involved really.

bashtage · 2023-07-24T11:27:59Z

numpy/random/mtrand.pyx

-        ongood = <np.ndarray>np.PyArray_FROM_OTF(ngood, np.NPY_LONG, np.NPY_ALIGNED)
-        onbad = <np.ndarray>np.PyArray_FROM_OTF(nbad, np.NPY_LONG, np.NPY_ALIGNED)
-        onsample = <np.ndarray>np.PyArray_FROM_OTF(nsample, np.NPY_LONG, np.NPY_ALIGNED)
+        ongood = <np.ndarray>np.PyArray_FROM_OTF(ngood, np.NPY_INT64, np.NPY_ALIGNED)


This seems a bit surprising in terms of the behavior on 32bit windows which I think should have a 32but interfer, if I understand the rule correctly.

The internal machinery requires int64, and there was a cast to int64 below. The thing that this does relax, is that previous an int64 input would raise because it cannot be safely cast to long (int32) on windows and 32bits.

(I am happy to change it to intp though! Just seemed like might as well go to int64 directly.)

But that was the point of using NPY_LONG here, according to the comment above it. To error out early rather than late.

@rkern right, I should remove the comment. Ensuring an error is why I added the explicit check below. This ensures that:

We still get an error when the value is out of bounds. (later, but not much)

The user can pass a default-integer array. This would otherwise simply fail otherwise because it is not a safe cast.

So, the relaxation and explicit check seemed unfortunately necessary because I don't want to break default integer array input. Or am I misunderstanding your comment?

pllim · 2023-07-24T13:59:15Z

Hello! We only test nightly numpy on Linux. Is this ready for downstream testing? Actually, this PR has no wheel, so I am not sure how easy it is to build this from scratch and test locally on Windows. But astropy would be interested to test this at some point.

seberg · 2023-07-24T14:47:29Z

Is this ready for downstream testing?

Yes, I think the NumPy changes should be mostly settled so that would make me very happy!

Just triggered the build, are the artifacts visible to you: https://github.com/numpy/numpy/actions/runs/5645884112#artifacts? just in case it works better windows-py10 and windows-py11.

I do not know how badly this will affect astropy or your dependencies; I would think some cython code breaks because it uses npc.int_[::1] or long[::1] memoryviews expecting/knowing the default integer input should come in. (Which is of course the exact opposite that many small libraries fight with, who hard-code int64, presuming that everyone uses that, but win64 users are left unable to use their libs).

pllim · 2023-07-24T14:53:10Z

Thanks!

seberg · 2023-08-09T12:15:35Z

I have tried this on sklearn without modifying SciPy and besides the fix above things seem decent (there is one file that needs some work-arounds in sklearn).

EDIT: Well, this isn't necessarily right, I didn't actually test on windows :).

This adepts the `_random.pyx` file to return whatever is the NumPy default integer, which, on NumPy 2.0 would fix. Since the cython symbol wasn't used, I just removed it as it bites with the overloading. See numpy/numpy#24224 for the commit which would make this necessary. At the time this is a bit hard to test since the SciPy nightlies are incompatible with that NumPy branch. But I thought I would put it out there for discussion. The alternative and simpler solution might be to just force 64bit results on any 64bit system and not worry about the NumPy version.

seberg · 2023-09-07T15:08:09Z

@mtsokol, I had added long and ulong here, because we effectively used int_ for that before and that is a worse name. I can live with requiring np.dtype("long").type also, mainly there is quite a bit of merge conflict in the parts related to that addition and I was wondering if things changed a bit w.r.t. to adding them.

(Maybe rebasing should wait until the type alias refactor is done...)

seberg · 2023-10-10T12:13:03Z

Thanks for updating @mtsokol! There are two issues remaining here, if I am not missing something big:

It would be nice to explicitly vet the choices in mtrand.pyx. We can't change the default, but how exactly to do it with dtype=int, etc. is not clear.
We need to add typing for the new long (and fix a few typing tests). Maybe @BvB93 can have a look. We may look into splitting out adding np.long here for simplicity though. (Adding np.long itself is simple, but I am not sure how involved the typing is.)

BvB93 · 2023-10-10T14:29:36Z

We need to add typing for the new long (and fix a few typing tests). Maybe @BvB93 can have a look. We may look into splitting out adding np.long here for simplicity though. (Adding np.long itself is simple, but I am not sure how involved the typing is.)

The typing sides of things is fortunately not too difficult: (u)int just becomes a (u)intp alias, and (u)long reuses the old (u)int type alias (+ a handful a np.ctypeslib-related test fixes). Got a small patch down below, though I can also directly push to your branch of you prefer.

typing patch

diff --git a/numpy/__init__.pyi b/numpy/__init__.pyi
index 418bdf614..03b118fdf 100644
--- a/numpy/__init__.pyi
+++ b/numpy/__init__.pyi
@@ -2857,7 +2857,8 @@ def __init__(self, value: _IntValue = ..., /) -> None: ...
 short = signedinteger[_NBitShort]
 intc = signedinteger[_NBitIntC]
 intp = signedinteger[_NBitIntP]
-int_ = signedinteger[_NBitInt]
+int_ = intp
+long = signedinteger[_NBitInt]
 longlong = signedinteger[_NBitLongLong]
 
 # TODO: `item`/`tolist` returns either `dt.timedelta` or `int`
@@ -2938,7 +2939,8 @@ def __init__(self, value: _IntValue = ..., /) -> None: ...
 ushort = unsignedinteger[_NBitShort]
 uintc = unsignedinteger[_NBitIntC]
 uintp = unsignedinteger[_NBitIntP]
-uint = unsignedinteger[_NBitInt]
+uint = uintp
+ulong = unsignedinteger[_NBitInt]
 ulonglong = unsignedinteger[_NBitLongLong]
 
 class inexact(number[_NBit1]):  # type: ignore
diff --git a/numpy/typing/mypy_plugin.py b/numpy/typing/mypy_plugin.py
index f4ad55341..78fea240f 100644
--- a/numpy/typing/mypy_plugin.py
+++ b/numpy/typing/mypy_plugin.py
@@ -59,7 +59,7 @@ def _get_precision_dict() -> dict[str, str]:
         ("_NBitShort", np.short),
         ("_NBitIntC", np.intc),
         ("_NBitIntP", np.intp),
-        ("_NBitInt", np.int_),
+        ("_NBitInt", np.long),
         ("_NBitLongLong", np.longlong),
 
         ("_NBitHalf", np.half),
diff --git a/numpy/typing/tests/data/reveal/ctypeslib.pyi b/numpy/typing/tests/data/reveal/ctypeslib.pyi
index a9712c074..5c3b2138f 100644
--- a/numpy/typing/tests/data/reveal/ctypeslib.pyi
+++ b/numpy/typing/tests/data/reveal/ctypeslib.pyi
@@ -79,17 +79,9 @@
 assert_type(np.ctypeslib.as_array(1), npt.NDArray[Any])
 assert_type(np.ctypeslib.as_array(pointer), npt.NDArray[Any])
 
-if sys.platform == "win32":
-    assert_type(np.ctypeslib.as_ctypes_type(np.int_), type[ct.c_int])
-    assert_type(np.ctypeslib.as_ctypes_type(np.uint), type[ct.c_uint])
-    assert_type(np.ctypeslib.as_ctypes(AR_uint), ct.Array[ct.c_uint])
-    assert_type(np.ctypeslib.as_ctypes(AR_int), ct.Array[ct.c_int])
-    assert_type(np.ctypeslib.as_ctypes(AR_uint.take(0)), ct.c_uint)
-    assert_type(np.ctypeslib.as_ctypes(AR_int.take(0)), ct.c_int)
-else:
-    assert_type(np.ctypeslib.as_ctypes_type(np.int_), type[ct.c_long])
-    assert_type(np.ctypeslib.as_ctypes_type(np.uint), type[ct.c_ulong])
-    assert_type(np.ctypeslib.as_ctypes(AR_uint), ct.Array[ct.c_ulong])
-    assert_type(np.ctypeslib.as_ctypes(AR_int), ct.Array[ct.c_long])
-    assert_type(np.ctypeslib.as_ctypes(AR_uint.take(0)), ct.c_ulong)
-    assert_type(np.ctypeslib.as_ctypes(AR_int.take(0)), ct.c_long)
+assert_type(np.ctypeslib.as_ctypes_type(np.int_), type[ct.c_long])
+assert_type(np.ctypeslib.as_ctypes_type(np.uint), type[ct.c_ulong])
+assert_type(np.ctypeslib.as_ctypes(AR_uint), ct.Array[ct.c_ulong])
+assert_type(np.ctypeslib.as_ctypes(AR_int), ct.Array[ct.c_long])
+assert_type(np.ctypeslib.as_ctypes(AR_uint.take(0)), ct.c_ulong)
+assert_type(np.ctypeslib.as_ctypes(AR_int.take(0)), ct.c_long)

Co-authored-by: Nathan Goldbaum <[email protected]>

Simply sort the long codes before int, because it might mean we prefer them (and that may fix the tests). Also explicitly add mapping to ssize_t for int (this pre-empts chaning the definition of `intp` admittedly)

seberg · 2023-11-01T17:42:21Z

Puh, typing works now... to a large degree just a full circle of changes in the end. Typing changes here are now almost only a few small fix-ups. I did a few greps over the typing stubs, but I cannot be 100% sure that there isn't a stray "default integer" around. Although, it seems to me that typing things like array coercion is relatively limited, so use of int or type[int] is also.

So, I think this should be as ready as it gets, if the docs are fine.

ngoldbaum

The docs look good, just a couple typo fixes and comments. @BvB93 can you give the typing changes one more look?

doc/source/numpy_2_0_migration_guide.rst

ngoldbaum · 2023-11-01T17:44:57Z

doc/source/numpy_2_0_migration_guide.rst

+are using the ``long`` or equivalent type on the C-side.
+In this case, you may wish to using ``intp`` and cast user input or support
+both ``long`` and ``intp`` (to better support NumPy 1.x as well).
+When creating a new integer array in C or Cython, the new ``NPY_DEFAULT_INT``


Maybe we should define this in 1.26 too?

Might make it a bit easier to use for downstream, but they still will need to force numpy>=1.26 at compile time. So I am tempted to say that unfortunately, it is better to vendor it:

#ifndef NPY_DEFAULT_INT #define NPY_DEFAULT_INT NPY_LONG #endif

(unless you force >2.0 at compile time)?

Happy to follow-up with a backport, not sure it's helpful, but wouldn't hurt.

Co-authored-by: Nathan Goldbaum <[email protected]>

BvB93 · 2023-11-02T14:25:45Z

numpy/typing/tests/data/reveal/ctypeslib.pyi

+    # Mainly on windows int is the same size as long but gets picked first:
+    assert_type(np.ctypeslib.as_ctypes_type(np.long), type[ct.c_int])
+    assert_type(np.ctypeslib.as_ctypes_type(np.ulong), type[ct.c_uint])
+    assert_type(np.ctypeslib.as_ctypes(AR_ulong), ct.Array[ct.c_uint])
+    assert_type(np.ctypeslib.as_ctypes(AR_long), ct.Array[ct.c_int])
+    assert_type(np.ctypeslib.as_ctypes(AR_long.take(0)), ct.c_int)
+    assert_type(np.ctypeslib.as_ctypes(AR_ulong.take(0)), ct.c_uint)
 else:
-    assert_type(np.ctypeslib.as_ctypes_type(np.int_), type[ct.c_long])
-    assert_type(np.ctypeslib.as_ctypes_type(np.uint), type[ct.c_ulong])
-    assert_type(np.ctypeslib.as_ctypes(AR_uint), ct.Array[ct.c_ulong])
-    assert_type(np.ctypeslib.as_ctypes(AR_int), ct.Array[ct.c_long])
-    assert_type(np.ctypeslib.as_ctypes(AR_uint.take(0)), ct.c_ulong)
-    assert_type(np.ctypeslib.as_ctypes(AR_int.take(0)), ct.c_long)
+    assert_type(np.ctypeslib.as_ctypes_type(np.long), type[ct.c_long])
+    assert_type(np.ctypeslib.as_ctypes_type(np.ulong), type[ct.c_ulong])
+    assert_type(np.ctypeslib.as_ctypes(AR_ulong), ct.Array[ct.c_ulong])
+    assert_type(np.ctypeslib.as_ctypes(AR_long), ct.Array[ct.c_long])
+    assert_type(np.ctypeslib.as_ctypes(AR_long.take(0)), ct.c_long)
+    assert_type(np.ctypeslib.as_ctypes(AR_ulong.take(0)), ct.c_ulong)


LGTM here; nice to see you managed to get things working without commenting them out!

ngoldbaum · 2023-11-02T14:37:11Z

Let's pull this in. Thanks for pushing this forward @seberg!

If you're looking at this PR because of new issues on Windows in CI or elsewhere, take a look at the NumPy 2.0 Migration Guide, which will be updated with specific information related to this change shortly after this PR is merged. In short, existing code that assumed the default integer type on windows is 32 bit may behave differently or raise new errors.

If that document doesn't answer your questions or if you are still confused, please feel free to open an issue describing your problem.

This adepts the `_random.pyx` file to return whatever is the NumPy default integer, which, on NumPy 2.0 would fix. Since the cython symbol wasn't used, I just removed it as it bites with the overloading. See numpy/numpy#24224 for the commit which would make this necessary. At the time this is a bit hard to test since the SciPy nightlies are incompatible with that NumPy branch. But I thought I would put it out there for discussion. The alternative and simpler solution might be to just force 64bit results on any 64bit system and not worry about the NumPy version.

github-actions bot added the 30 - API label Jul 20, 2023

seberg force-pushed the intp-default-int branch from 95b3b6a to 8e1a5f8 Compare July 20, 2023 19:03

seberg force-pushed the intp-default-int branch 2 times, most recently from 0503092 to df23b87 Compare July 21, 2023 17:28

bashtage reviewed Jul 24, 2023

View reviewed changes

seberg added the 36 - Build Build related PR label Jul 24, 2023

pllim mentioned this pull request Jul 24, 2023

TST: Test against numpy PR 24224 on Windows (API: Make 64bit default integer on 64bit windows) astropy/astropy#15084

Closed

seberg force-pushed the intp-default-int branch from 22b2e72 to 19dba2a Compare August 9, 2023 11:31

seberg mentioned this pull request Aug 9, 2023

MAINT: Adapt sklearn for NumPy default integer change scikit-learn/scikit-learn#27041

Merged

seberg force-pushed the intp-default-int branch from 19dba2a to 948d722 Compare August 9, 2023 12:49

seberg mentioned this pull request Sep 25, 2023

API: Deprecate np.int_ and np.uint #24794

Closed

mtsokol mentioned this pull request Oct 3, 2023

tests: remove np.int_ pybind/pybind11#4867

Merged

seberg mentioned this pull request Oct 4, 2023

C-API changes for NumPy 2.0 #24855

Closed

9 tasks

mtsokol force-pushed the intp-default-int branch from 948d722 to d28cd00 Compare October 9, 2023 09:10

seberg mentioned this pull request Oct 9, 2023

DISCUSS: What should the default integer type/dtype be #24890

Closed

seberg and others added 3 commits November 1, 2023 17:00

Update numpy/random/mtrand.pyx

2cb3ab6

Co-authored-by: Nathan Goldbaum <[email protected]>

DOC: Add release note and transition guide entry

fe23665

Fix typing and typo based on review

6fbcd01

seberg force-pushed the intp-default-int branch 7 times, most recently from ee5e537 to b518738 Compare November 1, 2023 17:06

TYP: Try fixing/working around windows ctypeslib mypy failures

b3bf8c3

Simply sort the long codes before int, because it might mean we prefer them (and that may fix the tests). Also explicitly add mapping to ssize_t for int (this pre-empts chaning the definition of `intp` admittedly)

seberg force-pushed the intp-default-int branch from b518738 to b3bf8c3 Compare November 1, 2023 17:15

ngoldbaum reviewed Nov 1, 2023

View reviewed changes

Update doc/source/numpy_2_0_migration_guide.rst

ba58f8b

Co-authored-by: Nathan Goldbaum <[email protected]>

BvB93 reviewed Nov 2, 2023

View reviewed changes

ngoldbaum merged commit 439762c into numpy:main Nov 2, 2023

seberg mentioned this pull request Nov 2, 2023

MAINT: special: consider changing long loops to int/int64 scipy/scipy#19462

Open

seberg deleted the intp-default-int branch November 2, 2023 16:12

seberg mentioned this pull request Nov 3, 2023

Consider changing the default np.indices dtype to np.intp #19274

Closed

steppi mentioned this pull request Nov 3, 2023

MAINT: Translate binom to C++ scipy/scipy#19471

Merged

rgommers mentioned this pull request Nov 13, 2023

Add KDTree to cupyx.scipy.spatial cupy/cupy#7671

Merged

mtsokol mentioned this pull request May 30, 2024

ENH: Add array API inspection functions #26572

Merged

rgommers mentioned this pull request Jun 24, 2024

Default int type is platform dependent #9464

Closed

DanielYang59 mentioned this pull request Aug 11, 2024

Explicitly use int64 in Numpy/cython code to avoid OS inconsistency materialsproject/pymatgen#3992

Merged

2 tasks

WallE256 mentioned this pull request Sep 24, 2024

Switch data types for compatibility with NumPy 2.0 citysu/csiread#37

Merged

Uh oh!

API: Make 64bit default integer on 64bit windows #24224

API: Make 64bit default integer on 64bit windows #24224

Uh oh!

Conversation

seberg commented Jul 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User Facing changes:

API changes for C-API/Cython users

Other changes:

Uh oh!

charris commented Jul 20, 2023

Uh oh!

seberg commented Jul 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Jul 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kai-Striega commented Jul 23, 2023

Uh oh!

seberg commented Jul 24, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pllim commented Jul 24, 2023

Uh oh!

seberg commented Jul 24, 2023

Uh oh!

pllim commented Jul 24, 2023

Uh oh!

seberg commented Aug 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Sep 7, 2023

Uh oh!

seberg commented Oct 10, 2023

Uh oh!

BvB93 commented Oct 10, 2023

Uh oh!

seberg commented Nov 1, 2023

Uh oh!

ngoldbaum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngoldbaum commented Nov 2, 2023

Uh oh!

Uh oh!

seberg commented Jul 20, 2023 •

edited

Loading

seberg commented Jul 21, 2023 •

edited

Loading

seberg commented Jul 21, 2023 •

edited

Loading

seberg commented Aug 9, 2023 •

edited

Loading