-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
API: Enforce one copy for __array__
when copy=True
#26215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
592ae0a
to
4f629bf
Compare
f7ff1b7
to
d654d93
Compare
d654d93
to
8c9f073
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two comments for now. But yeah, we may need the copy indicator unfortunately (maybe call it was_copied
, though? It has to be passed by reference anyway, so it is obvious for the caller that it is an output).
It would be good to have a test that we don't pass copy=True
if someone calls np.array([array_like])
(i.e. there is a list).
numpy/_core/tests/test_multiarray.py
Outdated
else: | ||
return base_arr | ||
|
||
process = psutil.Process() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just do this without psutils
: Just store copy you return, then compare do an identity/base check. Easier, and no need for psutil magic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right - I removed psutils
and stored copies/base arrays instead.
@@ -970,7 +972,7 @@ PyArray_DiscoverDTypeAndShape_Recursive( | |||
npy_intp out_shape[NPY_MAXDIMS], | |||
coercion_cache_obj ***coercion_cache_tail_ptr, | |||
PyArray_DTypeMeta *fixed_DType, enum _dtype_discovery_flags *flags, | |||
int copy) | |||
int copy, int *copy_indicator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This already has the _dtype_discovery_flags
I think it makes sense to use those "internally" to this part of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I merged copy_indicator
into _dtype_discovery_flags
.
8c9f073
to
973ce28
Compare
@seberg Sure - renamed!
Sure! I changed implementation so that sequences don't pass |
c1a3d15
to
854340f
Compare
854340f
to
553c473
Compare
I did one more rebase to solve conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One request for clarification inline and one more general suggestion.
I think was_copied
and COPY_WAS_CREATED
should be renamed was_copied_by__array__
and COPY_WAS_CREATED_BY__ARRAY__
to emphasize that this is specifically to handle subtleties around __array__
implementations. Alternative names or just comments around where they are used explaining that it has something to do with __array__
would help readers trying to understand why we have both an ensure copy flag and a flag tracking if a copy already happened.
Does this need a backport? |
No this change is targeted for 2.1 |
These names are clearer and more specific - done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I suppose we could use the argument in PyArray_DiscoverDTypeAndShape
for other things in principle, although I am not sure there is anything concrete, so let's use the name, it is much nicer everywhere else (and all other places will go away when the deprecation goes away).
The ENSURE_COPY
flag setting logic looks incorrect, though (I admit, I didn't try it).
/* TODO: As of NumPy 2.0 this path is only reachable by C-API. */ | ||
Py_SETREF(new, PyArray_NewCopy((PyArrayObject *)new, NPY_KEEPORDER)); | ||
if (was_copied_by__array__ != NULL && copy == 1 && | ||
must_copy_but_copy_kwarg_unimplemented == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add a comment to remove the was_copied_by__array__
argument again here, but it is probably clear enough (when it happens).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
numpy/_core/src/multiarray/ctors.c
Outdated
@@ -1615,6 +1622,10 @@ PyArray_FromAny_int(PyObject *op, PyArray_Descr *in_descr, | |||
assert(cache->converted_obj == op); | |||
arr = (PyArrayObject *)(cache->arr_or_sequence); | |||
/* we may need to cast or assert flags (e.g. copy) */ | |||
if (was_copied_by__array__ == 1 && flags & NPY_ARRAY_ENSURECOPY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (was_copied_by__array__ == 1 && flags & NPY_ARRAY_ENSURECOPY) { | |
if (was_copied_by__array__ == 1) { |
You don't need to test this, it is OK to unset the flag if it isn't set. (to me that is clearer, but maybe it's me)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
numpy/_core/src/multiarray/ctors.c
Outdated
@@ -1615,6 +1622,10 @@ PyArray_FromAny_int(PyObject *op, PyArray_Descr *in_descr, | |||
assert(cache->converted_obj == op); | |||
arr = (PyArrayObject *)(cache->arr_or_sequence); | |||
/* we may need to cast or assert flags (e.g. copy) */ | |||
if (was_copied_by__array__ == 1 && flags & NPY_ARRAY_ENSURECOPY) { | |||
flags = flags & ~NPY_ARRAY_ENSURECOPY; | |||
flags = flags | NPY_ARRAY_ENSURENOCOPY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wrong, so please add a test:
np.asarray(array_like, copy=True, order="F")
should just make two copies. You probably want to make sure that users honors the dtype
that was passed. That might be nice, but if it is not very clean to test for, I am not sure we should worry much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, a copy still needs to be made to convert to order="F"
. Then I shouldn't add ENSURENOCOPY
flag but only remove ENSURECOPY
one.
About honoring dtype
: I can check if arr
in cache has the same dtype
as requested dtype
variable. I wonder if it's sufficient for comparing dtypes in C-API:
PyArray_TYPE(arr) == dtype->type_num
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, unfortunately checking that is not really trivial (the only way to do it, is the way that the function uses later to see if it has to make a copy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized, it is even more complicated, for subarray dtypes (maybe those should just never be passed on). But that is probably not really new, although more pronounced by nudging downstream to actually implement dtype
.
Maybe the solution is to just unset the dtype when a copy was made, although unfortunately that doesn't help us in the path when no copy was made and dtype
is used by the __array__
implementor.
(Eplenation: if you write dtype=(4,)i
you get back an array with an additional dimension and dtype=i
. But if __array__
does that, then you end up doing another cast!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's maybe skip checking dtype
after __array__
call?
What we currently have is that:
np.array(array_like, dtype=new_dtype, copy=False)
errors when a dtype
wasn't honored in __array__
implementation (which is nice I think) and
np.arrray(array_like, dtype=new_dtype)
get's another copy if dtype
wasn't honored, same as order="F"
case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's maybe skip checking dtype after array call?
For here, that is perfectly fine to me, it is just a mild performance nudge for downstream after all.
However, maybe we should make a new issue. I am slightly worried that downstream until now always just ignored dtype=
, and now we will get new bugs with subarray dtypes.
OTOH, I will also say that, I don't expect anyone to explicitly run into it, so in a sense, it is an unrelated bug fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated flags logic here and added a test with:
np.asarray(array_like, copy=True, order="F")
e6a3ce5
to
3549902
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is OK to merge now.
I don't think this is going to cause much additional churn over adding the copy
keyword in the first place, hopefully downstream did the right thing.
__array__
when copy=True
__array__
when copy=True
Ok, merging it now. |
Addresses: #26208
Hi @seberg @ngoldbaum,
This PR enables passing
copy=True
todef __array__(...)
and prevents NumPy from making another copy once it can assume thatcopy=True
was successfully passed and the copy was made by the implementor.Following #26208 (comment) I implemented it with a
copy_indicator
that is set to1
, which later is used to decide whetherNPY_ARRAY_ENSURECOPY
flag can be removed.I added a test
test___array__copy_once
that checks if only one copy is made withnp.array(my_arr, copy=True)
. Withoutcopy_indicator
the test fails asnp.array(my_arr, copy=True)
doubles delta RSS (I usedpsutil
package and added it as another test dependency).I assumed that we want to prevent making implicit copies with
__array__
. Therefore the user is also responsible for properly implementingdtype
argument.