min/max and base math vectorization #3419

juliantaylor · 2013-06-09T19:47:56Z

this pull includes the rest of the non-result changing float vectorization.
min/max is a little ugly as it needs to propagate nan efficiently. Are there platforms were fpu flag propagation is supported but NO_FLOATING_POINT_SUPPORT is set?

the base math is lengthy but simple, all the special cases are to archive optimal performance for these very common operations.

base math reductions are not vectorized as they change the results slightly (float add and multiply are not associative)

Improves performance by ~1.5/3.0 for float/double.

Improves performance by ~1.5/3.0 for float/double for inplace or cpu cached operations

njsmith · 2013-06-09T20:09:00Z

For what it's worth, sum and prod have never guaranteed operation order,
and can already vary slightly depending on things like memory order.
On 9 Jun 2013 20:48, "Julian Taylor" [email protected] wrote:

this pull includes the rest of the non-result changing float vectorization.
min/max is a little ugly as it needs to propagate nan efficiently. Are
there platforms were fpu flag propagation is supported but
NO_FLOATING_POINT_SUPPORT is set?

the base math is lengthy but simple, all the special cases are to archive
optimal performance for these very common operations.

base math reductions are not vectorized as they change the results

slightly (float add and multiply are not associative)

You can merge this Pull Request by running

git pull https://github.com/juliantaylor/numpy vectorize-rest

Or view, comment on, or merge it at:

#3419
Commit Summary

ENH: Vectorize float min/max operation with sse2

ENH: vectorize base math with SSE2

File Changes

M doc/release/1.8.0-notes.rsthttps://github.com/min/max and base math vectorization #3419/files#diff-0(9)

M numpy/core/src/umath/loops.c.srchttps://github.com/min/max and base math vectorization #3419/files#diff-1(17)

M numpy/core/src/umath/simd.inc.srchttps://github.com/min/max and base math vectorization #3419/files#diff-2(318)

M numpy/core/tests/test_scalarmath.pyhttps://github.com/min/max and base math vectorization #3419/files#diff-3(32)

M numpy/core/tests/test_umath.pyhttps://github.com/min/max and base math vectorization #3419/files#diff-4(21)

Patch Links:

https://github.com/numpy/numpy/pull/3419.patch

https://github.com/numpy/numpy/pull/3419.diff

charris · 2013-06-09T20:11:15Z

numpy/core/src/umath/loops.c.src

@@ -1488,6 +1494,11 @@ NPY_NO_EXPORT void
 NPY_NO_EXPORT void
 @TYPE@_square(char **args, npy_intp *dimensions, npy_intp *steps, void *NPY_UNUSED(data))
 {
+    char * margs[] = {args[0], args[0], args[1]};


I'm wondering if this is portable? Some compilers (SUN) would only allow initialization of struct with constants. SUN is history, but I'm not sure it's ancient history. Is it possible to just pass args and steps?

I also looks like this pattern would be a candidate for a macro, maybe something like SIMD_UNARY_LOOP?

do we really need to support pre C89 compilers?
its no problem to do this in three steps, but at some point you have to draw the line what you want to support.

currently its only used twice and I don't see the need to do it more often. Its just so square and reciprocal are not slower than their explicit counter parts which do the same if the input pointers are equal.
The functions are obsolete on amd64 now.

We could just give it a try and wait for complaints, if any. As you say, it isn't worth supported obsolete stuff and C89 isn't that new ;)

charris · 2013-06-09T20:20:56Z

There are variations in floating point results anyway, especially on 32 bit Intel, depending on whether the compiler uses SSE or x87 extended precision registers and stores intemediate results in FPU registers or memory. So I'm not sure is is worth worrying about small changes in results, that's just floating point.

juliantaylor · 2013-06-09T20:24:18Z

I agree we should probably not worry about it much, but one scipy test fails when run with reduction vectorized numpy as it expects smaller errors.
I may file a PR with those vectorization later.

avoids declared but not defined warnings

juliantaylor · 2013-06-10T23:33:48Z

someone should update the nditer cython part tutorial
numpy now beats the cythonized sum of squares performance (with wraparound(False))
and that even without the reduction vectorization :D

charris · 2013-06-10T23:56:56Z

doc/release/1.8.0-notes.rst

-improved to make use of SSE2 CPU SIMD instructions.
+Performance improvements to base math, `sqrt`, `abs` and `min/max`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The base math (add, subtract, divide, multiply) and `sqrt`, `abs`, `min/max`


That should be maximum/minimum.

why? the functions are named min/max in the python api

The ufuncs that are modified are maximum/minimum, which are binary functions. The max/min equivalent methods are implemented in numpy/core/_methods.py using maximum.reduce/minimum.reduce and are accessed through amin/amax in numpy/core/fromnumeric.py. The python max/min are different, they treat the array as an iterator.

It is a small point, agreed.

then abs should be absolute too?

Good point. abs is a type method defined in number.c and calls the absolute ufunc. It's a bit of a tangle.

charris · 2013-06-10T23:59:05Z

@matthew-brett If you don't use gcc, could you check this PR for compiler errors on SPARC?

matthew-brett · 2013-06-11T12:06:56Z

Sorry Chuck - it's one of Yarick Halchenko's Debian boxes - no Sun cc, only gcc 4.4

charris · 2013-06-11T17:39:15Z

Let's give it a shot. Thanks.

min/max and base math vectorization

juliantaylor added 2 commits June 9, 2013 21:13

ENH: Vectorize float min/max operation with sse2

eb6cf4b

Improves performance by ~1.5/3.0 for float/double.

ENH: vectorize base math with SSE2

6bcba96

Improves performance by ~1.5/3.0 for float/double for inplace or cpu cached operations

charris reviewed Jun 9, 2013
View reviewed changes

MAINT: guard prototypes with HAVE_EMMINTRIN_H

c20531c

avoids declared but not defined warnings

charris reviewed Jun 10, 2013
View reviewed changes

juliantaylor added 2 commits June 11, 2013 18:21

MAINT: fix an uninitialized use in the no fenv fallback

88b459b

DOC: use full ufunc names

ffa0130

charris added a commit that referenced this pull request Jun 11, 2013

Merge pull request #3419 from juliantaylor/vectorize-rest

d0f5050

min/max and base math vectorization

charris merged commit d0f5050 into numpy:master Jun 11, 2013

ewmoore mentioned this pull request Jan 7, 2015

The norm of a size zero array should be zero. #5420

Closed

mattip mentioned this pull request Oct 21, 2018

BUG: maximum, minimum no longer emit warnings on NAN #12236

Merged

Uh oh!

min/max and base math vectorization #3419

min/max and base math vectorization #3419

Uh oh!

Conversation

juliantaylor commented Jun 9, 2013

Uh oh!

njsmith commented Jun 9, 2013

slightly (float add and multiply are not associative)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charris commented Jun 9, 2013

Uh oh!

juliantaylor commented Jun 9, 2013

Uh oh!

juliantaylor commented Jun 10, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charris commented Jun 10, 2013

Uh oh!

matthew-brett commented Jun 11, 2013

Uh oh!

charris commented Jun 11, 2013

Uh oh!

Uh oh!