Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DEP: Deprecate setting the strides and dtype of a numpy array #28901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

eendebakpt
Copy link
Contributor

See #28800

@eendebakpt eendebakpt marked this pull request as draft May 4, 2025 21:37
@eendebakpt eendebakpt changed the title Draft: DEP: Deprecate setting the strides and dtype of a numpy array DEP: Deprecate setting the strides and dtype of a numpy array May 4, 2025
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error paths are both incorrect and untested, so that is definitely required.

Other than that, should have a release note as well. But I guess you had it as draft for a reason :).

@@ -124,6 +124,11 @@ array_strides_set(PyArrayObject *self, PyObject *obj, void *NPY_UNUSED(ignored))
npy_intp upper_offset = 0;
Py_buffer view;

/* DEPRECATED 2025-05-04, NumPy 2.3 */
PyErr_WarnEx(PyExc_DeprecationWarning,
"Setting the strides on a Numpy array has been deprecated in Numpy 2.3.\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could point to strided_window_view and stride_tricks.as_strided. Although, not sure it is needed for this one.

msg = "Setting the strides on a Numpy array has been deprecated"
arr = np.arange(48)
with pytest.warns(DeprecationWarning, match=msg):
arr.strides = arr.strides
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the tests to test_deprecations.py. They should test the error path as well (which test_deprecations machinery does for you, although you can do it explicitly also.).

@jorenham

This comment was marked as resolved.

@eendebakpt
Copy link
Contributor Author

eendebakpt commented May 5, 2025

The .view() method sets the dtype and this is a bit hard to handle. In PyArray_View (see convert.c) first a new array is constructed with PyArray_NewFromDescr_int and then the dtype is updated with PyObject_SetAttrString. I replaced the call to PyObject_SetAttrString with a direct call to the code updating the dtype, bypassing the deprecation warning. This works fine for exact arrays, but gives issues for subclasses, in particular the masked array.

We could use PyObject_SetAttrString inside PyArray_View and catch the generated deprecation warning. This would hide changing the dtype in PyArray_View (but it is ok, there is only a single reference to the array), but users changing the dtype from python will get the deprecation warning.

Any other ideas on how to approach this?

Update: refactored the PR to use unique references for warning. This will exclude the calls from inside PyArray_View.

@eendebakpt eendebakpt force-pushed the deprecate_array_stride_set branch from df42838 to 2fa5086 Compare May 5, 2025 21:22
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this submodule update seems accidental

@seberg
Copy link
Member

seberg commented May 6, 2025

Any other ideas on how to approach this?
Update: refactored the PR to use unique references for warning. This will exclude the calls from inside PyArray_View.

Unique reference cheeck seems OK to me, also just to be nice for a bit. Overall, I think there is no reason not to pass the dtype already while creating the initial view (that currently has the same dtype), but not quite sure how that pans out.
My first guess would be that the only reason for this organization is that arr.dtype = includes the necessary checks for whether a view is OK.

As for some newer code changes here, let's avoid any warning filtering (yes I guess on new Python it's threadsafe at least, but...). Either, this still works fine now that you did the refcount check, or we should just use a arr.view() which could be guarded by dtype != dtype.

(I am not certain the refcount checks will work on PyPy as is, so there might be a problem with using it for avoiding th view() change.)

@eendebakpt eendebakpt force-pushed the deprecate_array_stride_set branch from af05298 to c27f61a Compare May 6, 2025 09:46
@eendebakpt
Copy link
Contributor Author

As for some newer code changes here, let's avoid any warning filtering (yes I guess on new Python it's threadsafe at least, but...). Either, this still works fine now that you did the refcount check, or we should just use a arr.view() which could be guarded by dtype != dtype.

The warning filtering is indeed not nice. But using a view won't work directly. For example for the recarray the dtype is updated in the array finalizer.

def __array_finalize__(self, obj):
if self.dtype.type is not record and self.dtype.names is not None:
# if self.dtype is not np.record, invoke __setattr__ which will
# convert it to a record if it is a void dtype.
with warnings.catch_warnings():
# gh-28901
warnings.filterwarnings("ignore", category=DeprecationWarning)
self.dtype = self.dtype

I can create a view with the new dtype (maybe this will create an infinite recursion, I will have to try), but I cannot replace self with the new view.

@seberg
Copy link
Member

seberg commented May 6, 2025

but I cannot replace self with the new view.

Right, this was maybe the reason for why it wasn't deprecated before. I think we need a solution for it, though. And as much as I dislike this record array stuff and we should maybe discourage its use.

My first thought is one or both of:

  • Allow this specific change for record-arrays, where the new dtype is known to be fully equivalent and behave exactly the same.
  • Just add a by-pass, with something _dtype = setting, or a (semi?)private function that forces it. (I slightly fear that _dtype might be used, so maybe something more awkward.)

Unless the "unique reference" path can fix this (but I guess it didn't).

The point is, if our code needs a warning manager, then we are just kicking the real solution down the road, since eventually that should be an error and the warning context manager will stop working anyway.

@eendebakpt eendebakpt force-pushed the deprecate_array_stride_set branch from b933d58 to 11f6569 Compare May 6, 2025 16:41
@eendebakpt
Copy link
Contributor Author

but I cannot replace self with the new view.

Right, this was maybe the reason for why it wasn't deprecated before. I think we need a solution for it, though. And as much as I dislike this record array stuff and we should maybe discourage its use.

My first thought is one or both of:

  • Allow this specific change for record-arrays, where the new dtype is known to be fully equivalent and behave exactly the same.

Do you means checking inside the dtype setter whether the object is a subclass of masked array or recarray? That is possible (a bit inconvenient as the masked array and rec array are defined in python).

  • Just add a by-pass, with something _dtype = setting, or a (semi?)private function that forces it. (I slightly fear that _dtype might be used, so maybe something more awkward.)

This seems a reasonable approach. Any user making use of _dtype would be aware this is an internal method and best avoided.

It would be much nicer if we could rewrite the code for masked array and recarray to not update the dtype at all. If we are not able to do this, then perhaps there are users with their own subclasses that will face the same issues. For masked array the most changes are introduced in #10314 and #5943. @ahaldane @mhvk do you know whether avoiding setting the dtype in the finalizer would be possible?

Unless the "unique reference" path can fix this (but I guess it didn't).

The unique reference path does not help here, at the point where the recarray or masked array sets the dtype, we already have 5 references to the object.

The point is, if our code needs a warning manager, then we are just kicking the real solution down the road, since eventually that should be an error and the warning context manager will stop working anyway.

Kicking down the solution might be ok at first. This PR is to signal users we are deprecating setting the dtype and strides. We can adapt our own code in a followup PR.

@seberg
Copy link
Member

seberg commented May 7, 2025

Do you means checking inside the dtype setter whether the object is a subclass of masked array or recarray?

I was thinking for record dtypes, you could check that the only difference in the dtype is indeed that it's a "record" one. (I don't quite recall what that meant, it's an annoying concept that exists solely to return different scalars, IIRC.)

It would be nice to just not need any of this and a great improvement to try and do something about it. But it might be a rather deep rabbit hole...

Kicking down the solution might be ok at first. This PR is to signal users we are deprecating setting the dtype and strides. We can adapt our own code in a followup PR.

Yeah it may be fine. It would be nice not to if it isn't hard, for three reasons (none of which are quite blocking maybe):

  1. Doing it know gives us full confidence we can go through with it.
  2. Whatever work-around we add may be useful/needed by downstream. Which makes life easier for them (worst case they need one version dependent set of code, not update that again).
  3. Warning context managers are not very fast and not thread safe on older Python versions.

It would be much nicer if we could rewrite the code for masked array and recarray to not update the dtype at all.

Yes, but especially for record arrays it may be a rather deep rabbit hole as mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants