BUG: Fix bug in AVX-512F np.maximum and np.minimum #15612

r-devulap · 2020-02-19T22:33:13Z

np.maximum.accumulate results in memory overlap for input and output arrays in which case vectorized implementation leads to incorrect results. This patch adds a pre-check condition to avoid running AVX-512F code in case there is a memory overlap.

r-devulap · 2020-02-19T22:37:29Z

numpy/core/src/umath/simd.inc.src

-     (abs(steps[2]) < MAX_STEP_SIZE))
+     (abs(steps[2]) < MAX_STEP_SIZE)  && \
+     (nomemoverlap(args[0], steps[0]*dimensions[0], args[2], steps[2]*dimensions[0])) && \
+     (nomemoverlap(args[1], steps[1]*dimensions[0], args[2], steps[2]*dimensions[0])))


This currently ensures that there is no memory overlap across the entire arrays. For large steps and large dimensions, steps[0]*dimensions[0] can overflow. Is this check an overkill? I don't fully understand under what circumstances does input and output arrays overlap in memory.

For steps[0]*dimensions[0] to overflow, dimensions[0] has to be 4398046511104 (since max step size = 2097152 and npy_intp has a max value of 2**63 - 1). The memory required to hold such a large array with that step size is 8192 Peta Bytes of data, which I assume is unrealistic.

code formatting: steps[0]*dimensions[0] -> steps[0] * dimensions[0]

seberg · 2020-02-19T22:48:41Z

I am a bit worried that our tests did not notice this. Could we see if we can add a slow test that checks this type of error for all reduction ufuncs? Or does that even exist, but does not get run on a machine that would notice the problem>

r-devulap · 2020-02-19T22:50:31Z

Yeah, I was surprised to see no test coverage for this case. I did add two simple tests to cover np.maximum.accumulate and np.minimum.accumulate. But would be a good idea to look a bit deeper.

seberg · 2020-02-20T17:45:38Z

numpy/core/src/umath/simd.inc.src

@@ -55,6 +55,40 @@ abs_ptrdiff(char *a, char *b)
    return (a > b) ? (a - b) : (b - a);
 }

+/*
+ * nomemoverlap - Do two strided arrays overlap?
+ * Need to accomodate negative strides.


Did you accommodate negative strides? The question is a bit tricky, because it used to be the case that they can overlap, however that should not happen anymore (but this is ensured in a completely different part of the code).

Unless my coffee has not kicked in, reductions also have an exact overlap. So basically, it should be sufficient to ensure exact overlapping. But, we should document that somewhere that some inner loops rely on that, and I am not sure where.

Oh, two strided arrays definitely can overlap, but I think only identically so.

My bad, the function does already accommodate negative strides, I will re-write the comment :)

The loops can be called with overlapping arrays if the data dependence is simple, assuming elementwise atomic operations. This I think currently means in practice that overlap can occur in the reduction/accumulation usage, and when the output pointer never catches the inputs when iterated in order (I remember there is a special case like this, but I don't recall where), or is always the same as the input pointer (the exact overlap case).

I think for other overlap cases, the output is undefined, and can be anything.

Vectorized loops have more strict requirement than elementwise ops, so I think they probably should indeed do their own extra checks. Or maybe buffering (I'm not very familiar how that part of numpy iterators works).

The function nomemoverlap returns a false if the two arrays have overlap. It accommodates for negative strides too. The vectorized loops rely on this function to ensure there is absolutely no overlap between input and output array, so in that sense it is being overly cautious.

seberg · 2020-02-20T17:47:02Z

numpy/core/src/umath/simd.inc.src

+{
+    char *ip_start, *ip_end, *op_start, *op_end;
+    if (ip_size < 0)
+    {


Just nitpicking, but if you touch the code, we usually put the opening line (except for function defs) on the previousl line.

Ah, damn. I knew this and yet .. I will fix it.

r-devulap · 2020-02-20T21:50:43Z

Victim to my own bug :) Hoping my other PR #15615 fixes the failing CI tests.

mattip

two minor nits, otherwise LGTM.

numpy/core/src/umath/simd.inc.src

mattip · 2020-02-22T20:32:20Z

numpy/core/src/umath/simd.inc.src

-     (abs(steps[2]) < MAX_STEP_SIZE))
+     (abs(steps[2]) < MAX_STEP_SIZE)  && \
+     (nomemoverlap(args[0], steps[0]*dimensions[0], args[2], steps[2]*dimensions[0])) && \
+     (nomemoverlap(args[1], steps[1]*dimensions[0], args[2], steps[2]*dimensions[0])))


code formatting: steps[0]*dimensions[0] -> steps[0] * dimensions[0]

numpy/core/tests/test_umath.py

…umulate

…m and np.minimum Fixes bug in np.maximum.accumulate and np.minimum.accumulate See numpy#15597

abs_ptrdiff(args[1], args[0]) >= (vsize) does not accomodate strides, specially when the strides are negative.

mattip · 2020-02-23T18:08:07Z

Thanks @r-devulap

r-devulap mentioned this pull request Feb 19, 2020

BUG: AVX-512 path for ufunc.accumulate is broken #15597

Closed

r-devulap commented Feb 19, 2020

View reviewed changes

charris added 00 - Bug component: numpy._core labels Feb 20, 2020

seberg reviewed Feb 20, 2020

View reviewed changes

r-devulap force-pushed the maximum-bug-avx branch from 2bf2c5c to 95f1177 Compare February 20, 2020 21:24

r-devulap force-pushed the maximum-bug-avx branch from 95f1177 to 944c921 Compare February 21, 2020 19:47

r-devulap mentioned this pull request Feb 21, 2020

update submodules to latest versions MacPython/numpy-wheels#73

Merged

mattip reviewed Feb 22, 2020

View reviewed changes

r-devulap force-pushed the maximum-bug-avx branch from 448486f to 9ffdbd3 Compare February 22, 2020 21:07

mattip reviewed Feb 23, 2020

View reviewed changes

numpy/core/tests/test_umath.py Show resolved Hide resolved

r-devulap added 4 commits February 23, 2020 07:37

TST: Adding test to validate np.maximum.accumulate and np.minimum.acc…

0164256

…umulate

BUG: Check for memory overlap in AVX-512F implementation of np.maximi…

085cdbe

…m and np.minimum Fixes bug in np.maximum.accumulate and np.minimum.accumulate See numpy#15597

BUG: Update IS_OUTPUT_BLOCKABLE_UNARY to use the nomemoverlap check

735fb99

abs_ptrdiff(args[1], args[0]) >= (vsize) does not accomodate strides, specially when the strides are negative.

MAINT: Improve formatting and update comments

629f980

r-devulap force-pushed the maximum-bug-avx branch from 9ffdbd3 to 629f980 Compare February 23, 2020 15:38

mattip merged commit 670ac4f into numpy:master Feb 23, 2020

r-devulap mentioned this pull request Oct 21, 2020

ENH:Umath Replace raw SIMD of unary float point(32-64) with NPYV - g0 #16247

Merged

11 tasks

rgommers added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Jul 12, 2022

Uh oh!

BUG: Fix bug in AVX-512F np.maximum and np.minimum #15612

BUG: Fix bug in AVX-512F np.maximum and np.minimum #15612

Uh oh!

Conversation

r-devulap commented Feb 19, 2020

Uh oh!

r-devulap Feb 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg commented Feb 19, 2020

Uh oh!

r-devulap commented Feb 19, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pv Feb 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

r-devulap commented Feb 20, 2020

Uh oh!

mattip left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mattip commented Feb 23, 2020

Uh oh!

Uh oh!

r-devulap Feb 19, 2020 •

edited

Loading

pv Feb 20, 2020 •

edited

Loading