-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: umath: ensure ufuncs are well-defined with memory overlapping inputs #8043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5537a37
to
48253f2
Compare
PyArray_ITEMSIZE(arr))) | ||
|
||
static NPY_INLINE int | ||
PyArray_EQUIVALENTLY_ITERABLE_OVERLAP_OK(PyArrayObject *arr1, PyArrayObject *arr2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems rather large for inline ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OTOH, I don't really want to put it in a .c file since the logic is tightly coupled to the macros here (and I suspect the multiarray/umath split adds to the mess). Removing the inline makes it emit compiler warnings. I could write this as a macro, but that's messy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ beckons ;)
numpy/core/tests/test_mem_overlap.py
Outdated
@@ -4,9 +4,11 @@ | |||
import itertools | |||
|
|||
import numpy as np | |||
from numpy.testing import run_module_suite, assert_, assert_raises, assert_equal | |||
from numpy.testing import (run_module_suite, assert_, assert_raises, assert_equal, assert_array_equal, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line too long ?
@@ -344,6 +344,8 @@ | |||
# End 1.9 API | |||
'PyArray_CheckAnyScalarExact': (300, NonNull(1)), | |||
# End 1.10 API | |||
# End 1.11 API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should update this now that 1.12 is branched. The cversion and C-API number also need updating, see documentation in doc/HOWTO_RELEASE.rst.txt
.
Just starting to look at this. |
numpy/core/src/umath/ufunc_object.c
Outdated
NpyIter_Deallocate(iter); | ||
return -1; | ||
} | ||
Py_DECREF(op_tmp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I look at this again, I guess op_tmp
should be placed back into op
instead of being discarded.
|
||
In the memory overlap checks done when ``NPY_ITER_COPY_IF_OVERLAP`` | ||
is specified, consider this array as overlapping even if it is | ||
exactly the same as another array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?. What other array? I think the operation here could be best explained with the help of some diagrams showing the data access and writebacks. The operation can be performed, and is valid, even if there is no overlap, just that it isn't necessary if no overlap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be help clarify behavior when the indexing is many to one in either array.
operation is the same as if all operands were copied. | ||
In cases where copies would need to be made, **the result of the | ||
computation may be undefined without this flag!** | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only for two operands? For inplace operations, both inputs are read, but only one is written too. Or does this apply for three operands, two inputs and one output. Or is it more general than that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It applies to any number of operands.
* @returns 1 if memory overlap found, 0 if not. | ||
*/ | ||
NPY_NO_EXPORT int | ||
index_has_memory_overlap(PyArrayObject *self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so you are actually checking indexes rather than the resulting view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used to checking the overlap between the array operated on, and the index array used in ufunc.at
, for example
x = np.arange(100)
np.invert.at(x[::-1], x) # <- crazy stuff if no x.copy() of the second argument
It doesn't look at the content of the indices, so it's a somewhat conservative check.
Just a first read through. This will need a release note eventually. |
Probably should bring in @seberg on this also. |
Test ufunc call memory overlap handling | ||
""" | ||
|
||
def check_unary_fuzz(self, operation, get_out_axis_size, dtype=np.int16, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would need to check again if these tests cover all places where the copyifoverlap behavior is in, and/or whether some of the tests are unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, checked --- removing any of NPY_COPY_IF_OVERLAP
et al. causes some test to fail
@@ -3635,6 +3654,7 @@ PyUFunc_Reduceat(PyUFuncObject *ufunc, PyArrayObject *arr, PyArrayObject *ind, | |||
/* In case COPY or UPDATEIFCOPY occurred */ | |||
op[0] = NpyIter_GetOperandArray(iter)[0]; | |||
op[1] = NpyIter_GetOperandArray(iter)[1]; | |||
op[2] = NpyIter_GetOperandArray(iter)[2]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this line added? a bug in the old code or due to the other changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a noop. Not necessary to add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I think it's useful to have here to ensure correctness, if we later on change NPY_ITER_COPY_IF_OVERLAP to make copies of also the input arrays.
Add a new NPY_ITER_COPY_IF_OVERLAP iterator flag to NpyIter, which instructs it to check if read operands overlap with write operands in memory, and make temporary copies to eliminate detected overlap. Thanks to Sebastian Berg.
These can differ if the iterator decides to make temporary copies of output arrays.
…ABLE loops These loops iterate over whole arrays in "trivial" order, so that it is possible to reason about the data dependency.
After this commit, all code paths in PyUFunc_GeneralizedFunction and PyUFunc_GenericFunction perform input/output operand memory overlap detection, and make copies to eliminate it.
…c_loops Because the wheremask loop does not write to all elements of the array, it needs to consider output arrays as READWRITE, in case COPY_IF_OVERLAP makes temporary copies. As the iterator may make temporary copies of arrays, make sure to pass those to __array_prepare__ instead of the originals.
This adds a new method PyArray_MapIterArrayCopyIfOverlap to the API.
Oh, right, now I understand! ...this rings a bell, too. @seberg, weren't you telling me at some point about how there's some code inside numpy that relies on implementing reducations via zero-strided calls to the non-reduce loops? Would that be affected by this? (My guess is that whatever code is using this trick is "below" the overlap checking in the stack so unaffected by it, and I'm totally happy with this not being something we guarantee to end-users. Also hopefully no-one is writing horrible |
The copies are triggered only if the iterator is given
NPY_ITER_COPY_IF_OVERLAP flag, which I was careful to set only if I
understand the code --- so it is not set for the reduction iterators,
and the overlap issue is handled before the iterator.
|
@pv Do you feel that this is about ready? |
@charris: I've read this several times through now, and only finding forgotten decref on error handling. I'm fairly confident now that it's what it says on the tin, and doesn't do anything more. It's a bit ugly that this adds |
One additional thing is that this probably won't work on PyPy due to the use of updateifcopy, but I guess that's part of a bigger pypy issue that needs separate fixes (probably sprinkling |
Well, I don't really understand the UPDATEIFCOPY issue, but since @mattip is on holiday, I'll comment anyway. All the new tests appear to pass on both pypy and pypy3. I'm guessing that's because the copy cannot be seen by the RPython GC. So I think this is OK for us. |
Yes, the updateifcopy arrays here are never passed back to Python. If
cpyext is effectively refcounted in this case, then there should be no
problem. I guess things would break also in other places if this wasn't
the case, as the iterator uses updateifcopy temporaries also for other
purposes.
|
numpy/core/src/multiarray/mapping.c
Outdated
if (index_type & (HAS_FANCY | HAS_BOOL)) { | ||
for (i = 0; i < num; ++i) { | ||
if (indices[i].object != NULL && PyArray_Check(indices[i].object) && | ||
solve_may_share_memory(self, (PyArrayObject *)indices[i].object, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have extra indent here to separate the condition from the block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit tricky with the long line, but it doesn't read well as is.
numpy/core/src/multiarray/mapping.c
Outdated
} | ||
|
||
if (extra_op != NULL && PyArray_Check(extra_op) && | ||
solve_may_share_memory(self, (PyArrayObject *)extra_op, 1) != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
numpy/core/src/multiarray/mapping.c
Outdated
@@ -3197,6 +3259,21 @@ PyArray_MapIterArray(PyArrayObject * a, PyObject * index) | |||
} | |||
|
|||
|
|||
/*NUMPY_API | |||
* | |||
* Use advanced indexing to iterate an array. Please note |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be documented somewhere public. It seems dangerous to expose an API that may change...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How could someone detect the change to this particular function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This the is old comment for PyArray_MapIterArray
, not added by me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, removing it, since it's unlikely we will change this...
@@ -2708,6 +2710,80 @@ npyiter_allocate_arrays(NpyIter *iter, | |||
bufferdata = NIT_BUFFERDATA(iter); | |||
} | |||
|
|||
if (flags & NPY_ITER_COPY_IF_OVERLAP) { | |||
/* Perform operand memory overlap checks if requested */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add more detail on what is being checked here, the notes below are a bit too local.
* make sense to go further. | ||
*/ | ||
may_share_memory = solve_may_share_memory( | ||
op[iop], op[iother], 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe extra indent here, or even break the args and align.
return 1; | ||
} | ||
|
||
/* Arrays overlapping in memory may be equivalently iterable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiline comment not correctly formatted ;) Also below.
} | ||
else if (stride1 < 0) { | ||
arr1_ahead = (stride1 <= stride2 && | ||
(npy_uintp)PyArray_BYTES(arr1) <= (npy_uintp)PyArray_BYTES(arr2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... Out of curiousity, did the compiler complain without the cast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The casts are not needed (probably leftovers from earlier versions).
numpy/core/src/umath/ufunc_object.c
Outdated
@@ -1730,12 +1764,17 @@ execute_fancy_ufunc_loop(PyUFuncObject *ufunc, | |||
} | |||
} | |||
for (i = nin; i < nop; ++i) { | |||
/* | |||
* Because we don't write to all elements, the output arrays | |||
* must be considered READWRITE by the iterator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updateifcopy copies all elements back, not only those for which wheremask is true, so the old content of the output array must be copied there.
Not done reviewing yet. Determining when a copy isn't needed for overlapping arrays seems tricky except in single operand the linear case. ISTR that the iterator may coalesce axes. |
It's unlikely we will break backward compatibility, so better not claim we will do it in the code.
I'm going to put this in so it will start getting more testing. I don't claim to have understood all the details, but I doubt I will discover any errors by inspection. The write up in the release notes is very nice, you should consider copying that over to the official documentation. There are still some style nits that I may attempt to clean up later, I would gently urge you to pay more attention to spaces around operators ('+') and line length, even though the latter may not always be as readable. Thanks for this work, Pauli. |
Wow, cool. Assuming nothing goes haywire, this is the next huge addition :)! |
Make ufuncs and ufunc operations to make temporary copies if input and output operands overlap in memory. Currently (cf. gh-6272, gh-1683) output from such operations is undefined. With this PR, the result is the same as if the operands were non-overlapping.
Fixes gh-1683
Adds new flags to NpyIter, and one new API function is added to deal with MapIter.
This PR does not fix up overlap handling in array assignment, that needs to be done later (and is slightly more hairy --- some common overlapping cases are in fact already handled). Unlike adding overlap-if-copy flag to NpyIter, it appears a similar flag would not be as useful in MapIter, as there are many special-cased code paths, and MapIter is not always made aware of all arrays operated on.