-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: Fix bug in AVX-512F np.maximum and np.minimum #15612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
numpy/core/src/umath/simd.inc.src
Outdated
(abs(steps[2]) < MAX_STEP_SIZE)) | ||
(abs(steps[2]) < MAX_STEP_SIZE) && \ | ||
(nomemoverlap(args[0], steps[0]*dimensions[0], args[2], steps[2]*dimensions[0])) && \ | ||
(nomemoverlap(args[1], steps[1]*dimensions[0], args[2], steps[2]*dimensions[0]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This currently ensures that there is no memory overlap across the entire arrays. For large steps and large dimensions, steps[0]*dimensions[0]
can overflow. Is this check an overkill? I don't fully understand under what circumstances does input and output arrays overlap in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For steps[0]*dimensions[0]
to overflow, dimensions[0]
has to be 4398046511104 (since max step size = 2097152 and npy_intp
has a max value of 2**63 - 1). The memory required to hold such a large array with that step size is 8192 Peta Bytes of data, which I assume is unrealistic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code formatting: steps[0]*dimensions[0]
-> steps[0] * dimensions[0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
I am a bit worried that our tests did not notice this. Could we see if we can add a slow test that checks this type of error for all reduction ufuncs? Or does that even exist, but does not get run on a machine that would notice the problem> |
Yeah, I was surprised to see no test coverage for this case. I did add two simple tests to cover |
numpy/core/src/umath/simd.inc.src
Outdated
@@ -55,6 +55,40 @@ abs_ptrdiff(char *a, char *b) | |||
return (a > b) ? (a - b) : (b - a); | |||
} | |||
|
|||
/* | |||
* nomemoverlap - Do two strided arrays overlap? | |||
* Need to accomodate negative strides. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you accommodate negative strides? The question is a bit tricky, because it used to be the case that they can overlap, however that should not happen anymore (but this is ensured in a completely different part of the code).
Unless my coffee has not kicked in, reductions also have an exact overlap. So basically, it should be sufficient to ensure exact overlapping. But, we should document that somewhere that some inner loops rely on that, and I am not sure where.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, two strided arrays definitely can overlap, but I think only identically so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, the function does already accommodate negative strides, I will re-write the comment :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loops can be called with overlapping arrays if the data dependence is simple, assuming elementwise atomic operations. This I think currently means in practice that overlap can occur in the reduction/accumulation usage, and when the output pointer never catches the inputs when iterated in order (I remember there is a special case like this, but I don't recall where), or is always the same as the input pointer (the exact overlap case).
I think for other overlap cases, the output is undefined, and can be anything.
Vectorized loops have more strict requirement than elementwise ops, so I think they probably should indeed do their own extra checks. Or maybe buffering (I'm not very familiar how that part of numpy iterators works).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function nomemoverlap
returns a false if the two arrays have overlap. It accommodates for negative strides too. The vectorized loops rely on this function to ensure there is absolutely no overlap between input and output array, so in that sense it is being overly cautious.
numpy/core/src/umath/simd.inc.src
Outdated
{ | ||
char *ip_start, *ip_end, *op_start, *op_end; | ||
if (ip_size < 0) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just nitpicking, but if you touch the code, we usually put the opening line (except for function defs) on the previousl line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, damn. I knew this and yet .. I will fix it.
2bf2c5c
to
95f1177
Compare
Victim to my own bug :) Hoping my other PR #15615 fixes the failing CI tests. |
95f1177
to
944c921
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two minor nits, otherwise LGTM.
numpy/core/src/umath/simd.inc.src
Outdated
(abs(steps[2]) < MAX_STEP_SIZE)) | ||
(abs(steps[2]) < MAX_STEP_SIZE) && \ | ||
(nomemoverlap(args[0], steps[0]*dimensions[0], args[2], steps[2]*dimensions[0])) && \ | ||
(nomemoverlap(args[1], steps[1]*dimensions[0], args[2], steps[2]*dimensions[0]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code formatting: steps[0]*dimensions[0]
-> steps[0] * dimensions[0]
448486f
to
9ffdbd3
Compare
…m and np.minimum Fixes bug in np.maximum.accumulate and np.minimum.accumulate See numpy#15597
abs_ptrdiff(args[1], args[0]) >= (vsize) does not accomodate strides, specially when the strides are negative.
9ffdbd3
to
629f980
Compare
Thanks @r-devulap |
Fixes #15597
np.maximum.accumulate
results in memory overlap for input and output arrays in which case vectorized implementation leads to incorrect results. This patch adds a pre-check condition to avoid running AVX-512F code in case there is a memory overlap.