Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BUG: Fix bug in AVX-512F np.maximum and np.minimum #15612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 23, 2020

Conversation

r-devulap
Copy link
Member

Fixes #15597

np.maximum.accumulate results in memory overlap for input and output arrays in which case vectorized implementation leads to incorrect results. This patch adds a pre-check condition to avoid running AVX-512F code in case there is a memory overlap.

(abs(steps[2]) < MAX_STEP_SIZE))
(abs(steps[2]) < MAX_STEP_SIZE) && \
(nomemoverlap(args[0], steps[0]*dimensions[0], args[2], steps[2]*dimensions[0])) && \
(nomemoverlap(args[1], steps[1]*dimensions[0], args[2], steps[2]*dimensions[0])))
Copy link
Member Author

@r-devulap r-devulap Feb 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently ensures that there is no memory overlap across the entire arrays. For large steps and large dimensions, steps[0]*dimensions[0] can overflow. Is this check an overkill? I don't fully understand under what circumstances does input and output arrays overlap in memory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For steps[0]*dimensions[0] to overflow, dimensions[0] has to be 4398046511104 (since max step size = 2097152 and npy_intp has a max value of 2**63 - 1). The memory required to hold such a large array with that step size is 8192 Peta Bytes of data, which I assume is unrealistic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code formatting: steps[0]*dimensions[0] -> steps[0] * dimensions[0]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@seberg
Copy link
Member

seberg commented Feb 19, 2020

I am a bit worried that our tests did not notice this. Could we see if we can add a slow test that checks this type of error for all reduction ufuncs? Or does that even exist, but does not get run on a machine that would notice the problem>

@r-devulap
Copy link
Member Author

Yeah, I was surprised to see no test coverage for this case. I did add two simple tests to cover np.maximum.accumulate and np.minimum.accumulate. But would be a good idea to look a bit deeper.

@@ -55,6 +55,40 @@ abs_ptrdiff(char *a, char *b)
return (a > b) ? (a - b) : (b - a);
}

/*
* nomemoverlap - Do two strided arrays overlap?
* Need to accomodate negative strides.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you accommodate negative strides? The question is a bit tricky, because it used to be the case that they can overlap, however that should not happen anymore (but this is ensured in a completely different part of the code).

Unless my coffee has not kicked in, reductions also have an exact overlap. So basically, it should be sufficient to ensure exact overlapping. But, we should document that somewhere that some inner loops rely on that, and I am not sure where.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, two strided arrays definitely can overlap, but I think only identically so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, the function does already accommodate negative strides, I will re-write the comment :)

Copy link
Member

@pv pv Feb 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loops can be called with overlapping arrays if the data dependence is simple, assuming elementwise atomic operations. This I think currently means in practice that overlap can occur in the reduction/accumulation usage, and when the output pointer never catches the inputs when iterated in order (I remember there is a special case like this, but I don't recall where), or is always the same as the input pointer (the exact overlap case).

I think for other overlap cases, the output is undefined, and can be anything.

Vectorized loops have more strict requirement than elementwise ops, so I think they probably should indeed do their own extra checks. Or maybe buffering (I'm not very familiar how that part of numpy iterators works).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function nomemoverlap returns a false if the two arrays have overlap. It accommodates for negative strides too. The vectorized loops rely on this function to ensure there is absolutely no overlap between input and output array, so in that sense it is being overly cautious.

{
char *ip_start, *ip_end, *op_start, *op_end;
if (ip_size < 0)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just nitpicking, but if you touch the code, we usually put the opening line (except for function defs) on the previousl line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, damn. I knew this and yet .. I will fix it.

@r-devulap
Copy link
Member Author

Victim to my own bug :) Hoping my other PR #15615 fixes the failing CI tests.

Copy link
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two minor nits, otherwise LGTM.

(abs(steps[2]) < MAX_STEP_SIZE))
(abs(steps[2]) < MAX_STEP_SIZE) && \
(nomemoverlap(args[0], steps[0]*dimensions[0], args[2], steps[2]*dimensions[0])) && \
(nomemoverlap(args[1], steps[1]*dimensions[0], args[2], steps[2]*dimensions[0])))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code formatting: steps[0]*dimensions[0] -> steps[0] * dimensions[0]

…m and np.minimum

Fixes bug in np.maximum.accumulate and np.minimum.accumulate
See numpy#15597
abs_ptrdiff(args[1], args[0]) >= (vsize) does not accomodate strides,
specially when the strides are negative.
@mattip mattip merged commit 670ac4f into numpy:master Feb 23, 2020
@mattip
Copy link
Member

mattip commented Feb 23, 2020

Thanks @r-devulap

@rgommers rgommers added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug component: numpy._core component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: AVX-512 path for ufunc.accumulate is broken
6 participants