Thanks to visit codestin.com
Credit goes to github.com

Skip to content

min/max and base math vectorization #3419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 11, 2013
Merged

Conversation

juliantaylor
Copy link
Contributor

this pull includes the rest of the non-result changing float vectorization.
min/max is a little ugly as it needs to propagate nan efficiently. Are there platforms were fpu flag propagation is supported but NO_FLOATING_POINT_SUPPORT is set?

the base math is lengthy but simple, all the special cases are to archive optimal performance for these very common operations.

base math reductions are not vectorized as they change the results slightly (float add and multiply are not associative)

Improves performance by ~1.5/3.0 for float/double.
Improves performance by ~1.5/3.0 for float/double for inplace or cpu
cached operations
@njsmith
Copy link
Member

njsmith commented Jun 9, 2013

For what it's worth, sum and prod have never guaranteed operation order,
and can already vary slightly depending on things like memory order.
On 9 Jun 2013 20:48, "Julian Taylor" [email protected] wrote:

this pull includes the rest of the non-result changing float vectorization.
min/max is a little ugly as it needs to propagate nan efficiently. Are
there platforms were fpu flag propagation is supported but
NO_FLOATING_POINT_SUPPORT is set?

the base math is lengthy but simple, all the special cases are to archive
optimal performance for these very common operations.

base math reductions are not vectorized as they change the results

slightly (float add and multiply are not associative)

You can merge this Pull Request by running

git pull https://github.com/juliantaylor/numpy vectorize-rest

Or view, comment on, or merge it at:

#3419
Commit Summary

  • ENH: Vectorize float min/max operation with sse2
  • ENH: vectorize base math with SSE2

File Changes

Patch Links:

@@ -1488,6 +1494,11 @@ NPY_NO_EXPORT void
NPY_NO_EXPORT void
@TYPE@_square(char **args, npy_intp *dimensions, npy_intp *steps, void *NPY_UNUSED(data))
{
char * margs[] = {args[0], args[0], args[1]};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this is portable? Some compilers (SUN) would only allow initialization of struct with constants. SUN is history, but I'm not sure it's ancient history. Is it possible to just pass args and steps?

I also looks like this pattern would be a candidate for a macro, maybe something like SIMD_UNARY_LOOP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to support pre C89 compilers?
its no problem to do this in three steps, but at some point you have to draw the line what you want to support.

currently its only used twice and I don't see the need to do it more often. Its just so square and reciprocal are not slower than their explicit counter parts which do the same if the input pointers are equal.
The functions are obsolete on amd64 now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just give it a try and wait for complaints, if any. As you say, it isn't worth supported obsolete stuff and C89 isn't that new ;)

@charris
Copy link
Member

charris commented Jun 9, 2013

There are variations in floating point results anyway, especially on 32 bit Intel, depending on whether the compiler uses SSE or x87 extended precision registers and stores intemediate results in FPU registers or memory. So I'm not sure is is worth worrying about small changes in results, that's just floating point.

@juliantaylor
Copy link
Contributor Author

I agree we should probably not worry about it much, but one scipy test fails when run with reduction vectorized numpy as it expects smaller errors.
I may file a PR with those vectorization later.

avoids declared but not defined warnings
@juliantaylor
Copy link
Contributor Author

someone should update the nditer cython part tutorial
numpy now beats the cythonized sum of squares performance (with wraparound(False))
and that even without the reduction vectorization :D

improved to make use of SSE2 CPU SIMD instructions.
Performance improvements to base math, `sqrt`, `abs` and `min/max`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The base math (add, subtract, divide, multiply) and `sqrt`, `abs`, `min/max`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be maximum/minimum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? the functions are named min/max in the python api

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ufuncs that are modified are maximum/minimum, which are binary functions. The max/min equivalent methods are implemented in numpy/core/_methods.py using maximum.reduce/minimum.reduce and are accessed through amin/amax in numpy/core/fromnumeric.py. The python max/min are different, they treat the array as an iterator.

It is a small point, agreed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then abs should be absolute too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. abs is a type method defined in number.c and calls the absolute ufunc. It's a bit of a tangle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@charris
Copy link
Member

charris commented Jun 10, 2013

@matthew-brett If you don't use gcc, could you check this PR for compiler errors on SPARC?

@matthew-brett
Copy link
Contributor

Sorry Chuck - it's one of Yarick Halchenko's Debian boxes - no Sun cc, only gcc 4.4

@charris
Copy link
Member

charris commented Jun 11, 2013

Let's give it a shot. Thanks.

charris added a commit that referenced this pull request Jun 11, 2013
min/max and base math vectorization
@charris charris merged commit d0f5050 into numpy:master Jun 11, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants