-
-
Notifications
You must be signed in to change notification settings - Fork 11k
MAINT: Add a fast path to var for complex input #15696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Related: #13177 |
numpy/core/_methods.py
Outdated
return um.add( | ||
_var(arr.real, axis, dtype, out, ddof, keepdims), | ||
_var(arr.imag, axis, dtype, out, ddof, keepdims) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reusing out
here doesn't seem safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - will look into it and think about adding new test(s) for this extra path.
In your opinion, is the change even worth it? I'm not sure how I feel about adding extra complexity for ~20% speedup (which is very system-dependent).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you get the same speedup by adding the fast path right where conjugate
is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had originally been attempting a solution along those lines - If there were a conjmultiply
ufunc for instance that would do the trick but I don't know of any such function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
um.multiply(x.real, x.real, out=x.real)
um.multiply(x.imag, x.imag, out=x.imag)
x = np.add(x.real, x.imag) # could use out=x.real, but may be slower
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try something along those lines - thanks for the suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
um.add(um.multiply(x.real, x.real), um.multiply(x.imag, x.imag))
This results in the speedup, but unfortunately still implicitly depends on the dtype of the input being complex - it fails on the test suite for object arrays with complex numbers in them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think special casing complex dtypes is ok, as long as you do it in the same place that the other types are specialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback - the current approach (c049074) keeps all of the special cases based on dtypes in the same area of the code where all the other type-determination occurs.
da841f2
to
c049074
Compare
I've updated the PR to incorporate @eric-wieser 's feedback about keeping the conditions based on dtypes all contained in the same part of the code. The most recent version adds fast-paths for I also added two new tests: one to test Finally, since this PR is about performance, I added a benchmark to Thanks to @seberg for the Here are the current results of the benchmark on my system: Original implementation:
Manually computing var from real & imag components
|
numpy/core/_methods.py
Outdated
xv = x.view(dtype=(nt.float64, (2,))) | ||
um.multiply(xv, xv, out=xv) | ||
x = um.add(xv[..., 0], xv[..., 1], out=x.real).real | ||
elif arr.dtype.type is nt.complex64: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add clongdouble
too? Perhaps add a dictionary to map the types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, one thing to take care of (and that should have a test) is that non-native byte order does not use the view, or only uses it with the correct byte order. (The identity check does not catch byte order differences. An equality check, even if a bit strange maybe, does). Not sure of the top of my head how best to write it nicest.
After my PR goes in, we could even think about adding a .real_dtype()
method ;).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eric-wieser & @seberg for the suggestions! I've made the following changes/updates:
- Added a fast-path for
complex256
. - Re-formulated the complex type checking to use a dict (eliminates some LOC's, doesn't appear to hurt the performance)
- Added an additional check for the byteorder - any array that has anything other than non-native byte order will be funnelled to the original "slow" path so that it doesn't hit the
view
s. - Updated the tests:
- Modified a test to check
complex{64,128,256}
input types - Added a test to verify
var()
gives the same result for complex arrays with both native and non-native byte order.
- Modified a test to check
Let me know what you think of this iteration, and thanks again for the feedback!
c049074
to
7635ea6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me, small fixes. Will probably look over it briefly and merge soon if nobody has another opinion.
Can you add the benchmark as the first commit, and then squash the rest into a single commit? |
e066eba
to
26a6ba2
Compare
Thanks for all the feedback @seberg - it should be down to two commits now: one with the updated code/tests, and a separate one for the accompanying benchmark. |
numpy/core/_methods.py
Outdated
_complex_to_float = { | ||
nt.complex64 : nt.float32, | ||
nt.complex128 : nt.float64, | ||
nt.clongdouble: nt.longdouble |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you make the keys np.dtype
instances, and crucially reverse the order, then you can avoid the isnative
check below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reversing is necessary so that when double and longdouble are equivalent, the latter, double, wins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the comment on ordering; particularly the second comment. Also, I thought that Python dictionaries didn't make any guarantees about ordering. Would this need to be an OrderedDict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dictionary literals are evaluated sequentially. double and long double are considered equal on Windows, you want double to take precedence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the lack of explanation, will expand when not on Mobile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @seberg meant that viewing a big-endian complex double
as a little-endian double
was unsafe. As long as you match the endianness before and after the view, everything will be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue with the concrete dtype instances is that attempting to import dtype
from numpy.core
or the full namespace (import numpy as np
) results in a circular import in numpy/core/_methods.py
. Is there a way to get concrete datatype instances using only numpy.core.numerictypes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.core.numerictypes.dtype
? Or use np.core.multiarray.dtype
, which is where it actually comes from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I don't know why I had such a hard time chasing that down 😑
numpy/core/_methods.py
Outdated
@@ -189,8 +189,25 @@ def _var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False): | |||
# Note that x may not be inexact and that we need it to be an array, | |||
# not a scalar. | |||
x = asanyarray(arr - arrmean) | |||
|
|||
# Complex types to -> (2,)float view for fast-path | |||
_complex_to_float = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May as well make this global, no need to construct it every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had considered that as well - the overhead from building the dictionary seemed negligible so I had left it in, but I can just as well move it outside the function 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move it out, if nothing else if makes var
shorter and easier to read.
numpy/core/_methods.py
Outdated
elif x.dtype.type in _complex_to_float and arr.dtype.isnative: | ||
xv = x.view(dtype=(_complex_to_float[x.dtype.type], (2,))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With my dict suggestion, this becomes
elif x.dtype.type in _complex_to_float and arr.dtype.isnative: | |
xv = x.view(dtype=(_complex_to_float[x.dtype.type], (2,))) | |
elif x.dtype in _complex_to_float: | |
xv = x.view(dtype=(_complex_to_float[x.dtype], (2,))) |
var currently has a conditional that results in conjugate being called for the variance calculation of complex inputs. This leg of the computation is slow. This PR avoids this computational leg for complex inputs via a type check. Closes numpy#15684
f9f5f2c
to
ffe1f46
Compare
Thanks for the additional feedback @eric-wieser, the latest commit has incorporated your suggestions:
Everything should still be formulated so that the code/tests are in one commit (c898ff3) and the benchmark is in another (ffe1f46). Let me know if I've missed anything. |
Thanks Ross and Eric, I am going to put this in. I doubt the byteswapped paths are really worth it speed wise, but it doesn't add bad complexity either and keeps things symmetric. EDIT: Arg, missed that the benchmark commit was not first and called |
I'm starting to think they're unreachable, since the subtraction output is probably always native... |
Hmmpf, true should have noticed... I guess ross will fix it up probably. |
Removes unnecessary code introduced in numpy#15696. Non-native byte orders were explicitly added to the fast-path check in _var for complex numbers. However, the non-native path is unreachable due to coercion in upstream ufuncs.
var
currently has a conditional that results inconjugate
being called for the variance calculation of complex inputs.This leg of the computation is slower than it could be if
conjugate
were avoided altogether. This PR avoids this computational leg for complex inputs via a type check.Closes #15684.
Here are the results of a benchmark on my system:
On 6894bbc (i.e. pre-change):
This PR (with fast path):
I'm not sure adding the additional complexity is necessarily worth the speed-up, especially since the change results in a slow-down for the non-computation-constrained case (i.e. smaller arrays) - please let me know what you think.