MAINT: Add a fast path to var for complex input #15696

rossbar · 2020-03-04T00:08:27Z

var currently has a conditional that results in conjugate being called for the variance calculation of complex inputs.

This leg of the computation is slower than it could be if conjugate were avoided altogether. This PR avoids this computational leg for complex inputs via a type check.

Closes #15684.

Here are the results of a benchmark on my system:

On 6894bbc (i.e. pre-change):

             =========== =============
                 arr size                
              ----------- -------------
                   10        35.3±1μs  
                  100      35.3±0.05μs 
                  1000       41.1±2μs  
                 10000       72.9±5μs  
                 100000     725±300μs  
                1000000     8.21±0.9ms 
                10000000     94.0±3ms  
               100000000     898±10ms  
              =========== =============

This PR (with fast path):

             =========== =============
                 arr size                
              ----------- -------------
                   10       77.7±0.3μs 
                  100      77.6±0.08μs 
                  1000       85.7±4μs  
                 10000       111±7μs   
                 100000      339±2μs   
                1000000      6.03±1ms  
                10000000     75.4±6ms  
               100000000     756±30ms  
              =========== =============

I'm not sure adding the additional complexity is necessarily worth the speed-up, especially since the change results in a slow-down for the non-computation-constrained case (i.e. smaller arrays) - please let me know what you think.

rossbar · 2020-03-04T00:26:21Z

Related: #13177

eric-wieser · 2020-03-04T00:32:02Z

numpy/core/_methods.py

+        return um.add(
+            _var(arr.real, axis, dtype, out, ddof, keepdims),
+            _var(arr.imag, axis, dtype, out, ddof, keepdims)
+        )


Reusing out here doesn't seem safe

Good point - will look into it and think about adding new test(s) for this extra path.

In your opinion, is the change even worth it? I'm not sure how I feel about adding extra complexity for ~20% speedup (which is very system-dependent).

Can you get the same speedup by adding the fast path right where conjugate is called?

I had originally been attempting a solution along those lines - If there were a conjmultiply ufunc for instance that would do the trick but I don't know of any such function.

How about

um.multiply(x.real, x.real, out=x.real) um.multiply(x.imag, x.imag, out=x.imag) x = np.add(x.real, x.imag) # could use out=x.real, but may be slower

I will try something along those lines - thanks for the suggestion

um.add(um.multiply(x.real, x.real), um.multiply(x.imag, x.imag))

This results in the speedup, but unfortunately still implicitly depends on the dtype of the input being complex - it fails on the test suite for object arrays with complex numbers in them

I think special casing complex dtypes is ok, as long as you do it in the same place that the other types are specialized.

Thanks for the feedback - the current approach (c049074) keeps all of the special cases based on dtypes in the same area of the code where all the other type-determination occurs.

rossbar · 2020-03-06T08:40:19Z

I've updated the PR to incorporate @eric-wieser 's feedback about keeping the conditions based on dtypes all contained in the same part of the code. The most recent version adds fast-paths for complex128 and complex64 input.

I also added two new tests: one to test _var for single precision numbers, and a second for testing _var for >2 dimensional arrays as the fast paths use a view of the orignal array that increases the number of dimensions.

Finally, since this PR is about performance, I added a benchmark to bench_core. I'm not sure whether modifying the benchmarks is encouraged, but it should at least help evaluate the PR. If it shouldn't be included, I'm happy to revert (or you can - the benchmark is added in c049074).

Thanks to @seberg for the view idea and a lot of other great input re: ufunc behavior.

Here are the current results of the benchmark on my system:

Original implementation:

              =========== =============
                 arr size                
              ----------- -------------
                   10       40.3±0.4μs 
                  100       40.2±0.9μs 
                  1000     46.8±0.03μs 
                 10000       94.0±7μs  
                 100000     585±300μs  
                1000000    8.15±0.06ms 
                10000000     91.4±2ms  
               100000000     914±8ms   
              =========== =============

Manually computing var from real & imag components a.real.var() + a.imag.var():

              =========== =============
                 arr size                
              ----------- -------------
                   10        75.0±2μs  
                  100       74.8±0.3μs 
                  1000       83.4±3μs  
                 10000       116±4μs   
                 100000      411±10μs  
                1000000    6.10±0.05ms 
                10000000    73.6±0.7ms 
               100000000     745±7ms   
              =========== =============

_var fast path (this PR):

              =========== ============
                 arr size               
              ----------- ------------
                   10      43.4±0.2μs 
                  100      44.0±0.1μs 
                  1000      50.0±1μs  
                 10000      72.5±3μs  
                 100000     286±7μs   
                1000000    5.88±0.2ms 
                10000000    66.7±3ms  
               100000000    674±4ms   
              =========== ============

eric-wieser · 2020-03-06T08:43:53Z

numpy/core/_methods.py

+        xv = x.view(dtype=(nt.float64, (2,)))
+        um.multiply(xv, xv, out=xv)
+        x = um.add(xv[..., 0], xv[..., 1], out=x.real).real
+    elif arr.dtype.type is nt.complex64:


Can you add clongdouble too? Perhaps add a dictionary to map the types.

Agreed, one thing to take care of (and that should have a test) is that non-native byte order does not use the view, or only uses it with the correct byte order. (The identity check does not catch byte order differences. An equality check, even if a bit strange maybe, does). Not sure of the top of my head how best to write it nicest.
After my PR goes in, we could even think about adding a .real_dtype() method ;).

Thanks @eric-wieser & @seberg for the suggestions! I've made the following changes/updates:

Added a fast-path for complex256.

Re-formulated the complex type checking to use a dict (eliminates some LOC's, doesn't appear to hurt the performance)

Added an additional check for the byteorder - any array that has anything other than non-native byte order will be funnelled to the original "slow" path so that it doesn't hit the views.

Updated the tests:

Modified a test to check complex{64,128,256} input types

Added a test to verify var() gives the same result for complex arrays with both native and non-native byte order.

Let me know what you think of this iteration, and thanks again for the feedback!

seberg

Generally looks good to me, small fixes. Will probably look over it briefly and merge soon if nobody has another opinion.

numpy/core/_methods.py

numpy/core/tests/test_multiarray.py

benchmarks/benchmarks/bench_core.py

numpy/core/tests/test_multiarray.py

numpy/core/_methods.py

seberg · 2020-03-07T01:58:14Z

Can you add the benchmark as the first commit, and then squash the rest into a single commit?

rossbar · 2020-03-07T06:47:05Z

Thanks for all the feedback @seberg - it should be down to two commits now: one with the updated code/tests, and a separate one for the accompanying benchmark.

eric-wieser · 2020-03-08T16:58:49Z

numpy/core/_methods.py

+    _complex_to_float = {
+        nt.complex64 : nt.float32,
+        nt.complex128 : nt.float64,
+        nt.clongdouble: nt.longdouble


If you make the keys np.dtype instances, and crucially reverse the order, then you can avoid the isnative check below.

Reversing is necessary so that when double and longdouble are equivalent, the latter, double, wins.

I don't understand the comment on ordering; particularly the second comment. Also, I thought that Python dictionaries didn't make any guarantees about ordering. Would this need to be an OrderedDict?

Dictionary literals are evaluated sequentially. double and long double are considered equal on Windows, you want double to take precedence.

Apologies for the lack of explanation, will expand when not on Mobile.

I think @seberg meant that viewing a big-endian complex double as a little-endian double was unsafe. As long as you match the endianness before and after the view, everything will be fine.

Thanks for the clarification 👍

One issue with the concrete dtype instances is that attempting to import dtype from numpy.core or the full namespace (import numpy as np) results in a circular import in numpy/core/_methods.py. Is there a way to get concrete datatype instances using only numpy.core.numerictypes?

np.core.numerictypes.dtype? Or use np.core.multiarray.dtype, which is where it actually comes from.

Thanks, I don't know why I had such a hard time chasing that down 😑

eric-wieser · 2020-03-08T17:16:51Z

numpy/core/_methods.py

@@ -189,8 +189,25 @@ def _var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False):
    # Note that x may not be inexact and that we need it to be an array,
    # not a scalar.
    x = asanyarray(arr - arrmean)
+
+    # Complex types to -> (2,)float view for fast-path
+    _complex_to_float = {


May as well make this global, no need to construct it every time.

I had considered that as well - the overhead from building the dictionary seemed negligible so I had left it in, but I can just as well move it outside the function 👍

Let's move it out, if nothing else if makes var shorter and easier to read.

eric-wieser · 2020-03-09T08:25:39Z

numpy/core/_methods.py

+    elif x.dtype.type in _complex_to_float and arr.dtype.isnative:
+        xv = x.view(dtype=(_complex_to_float[x.dtype.type], (2,)))


With my dict suggestion, this becomes

Suggested change

elif x.dtype.type in _complex_to_float and arr.dtype.isnative:

xv = x.view(dtype=(_complex_to_float[x.dtype.type], (2,)))

elif x.dtype in _complex_to_float:

xv = x.view(dtype=(_complex_to_float[x.dtype], (2,)))

numpy/core/tests/test_multiarray.py

var currently has a conditional that results in conjugate being called for the variance calculation of complex inputs. This leg of the computation is slow. This PR avoids this computational leg for complex inputs via a type check. Closes numpy#15684

rossbar · 2020-03-09T18:23:16Z

Thanks for the additional feedback @eric-wieser, the latest commit has incorporated your suggestions:

Moved the definition of the complex->float mapping outside of _var and use concrete dtype instances
Explicitly handle the double/longdouble case for increased code clarity
Added non-native byteorders to the mapping and modified the corresponding type check in _var

Everything should still be formulated so that the code/tests are in one commit (c898ff3) and the benchmark is in another (ffe1f46).

Let me know if I've missed anything.

seberg · 2020-03-10T23:03:30Z

Thanks Ross and Eric, I am going to put this in. I doubt the byteswapped paths are really worth it speed wise, but it doesn't add bad complexity either and keeps things symmetric.

EDIT: Arg, missed that the benchmark commit was not first and called BENCH:, not a bit deal though.

eric-wieser · 2020-03-10T23:21:18Z

I doubt the byteswapped paths are really worth it speed wise

I'm starting to think they're unreachable, since the subtraction output is probably always native...

seberg · 2020-03-10T23:28:36Z

Hmmpf, true should have noticed... I guess ross will fix it up probably.

Removes unnecessary code introduced in numpy#15696. Non-native byte orders were explicitly added to the fast-path check in _var for complex numbers. However, the non-native path is unreachable due to coercion in upstream ufuncs.

eric-wieser reviewed Mar 4, 2020

View reviewed changes

charris changed the title ~~ENH: Adds a fast path to var for complex input~~ MAINT: Adds a fast path to var for complex input Mar 4, 2020

charris added 03 - Maintenance component: numpy._core labels Mar 4, 2020

charris changed the title ~~MAINT: Adds a fast path to var for complex input~~ MAINT: Add a fast path to var for complex input Mar 4, 2020

rossbar force-pushed the enh/var_complex_fastpath branch 2 times, most recently from da841f2 to c049074 Compare March 6, 2020 08:20

eric-wieser reviewed Mar 6, 2020

View reviewed changes

rossbar force-pushed the enh/var_complex_fastpath branch from c049074 to 7635ea6 Compare March 6, 2020 23:18

seberg approved these changes Mar 7, 2020

View reviewed changes

rossbar force-pushed the enh/var_complex_fastpath branch from e066eba to 26a6ba2 Compare March 7, 2020 06:41

eric-wieser reviewed Mar 8, 2020

View reviewed changes

eric-wieser reviewed Mar 9, 2020

View reviewed changes

numpy/core/tests/test_multiarray.py Show resolved Hide resolved

rossbar added 2 commits March 9, 2020 10:39

Added benchmark for _var with complex input.

ffe1f46

rossbar force-pushed the enh/var_complex_fastpath branch from f9f5f2c to ffe1f46 Compare March 9, 2020 18:01

seberg merged commit 2e91696 into numpy:master Mar 10, 2020

rossbar deleted the enh/var_complex_fastpath branch March 11, 2020 04:50

rossbar mentioned this pull request Mar 11, 2020

MAINT: Remove non-native byte order from _var check. #15738

Merged

		elif x.dtype.type in _complex_to_float and arr.dtype.isnative:
		xv = x.view(dtype=(_complex_to_float[x.dtype.type], (2,)))

Uh oh!

MAINT: Add a fast path to var for complex input #15696

MAINT: Add a fast path to var for complex input #15696

Uh oh!

Conversation

rossbar commented Mar 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rossbar commented Mar 4, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rossbar Mar 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rossbar commented Mar 6, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

seberg commented Mar 7, 2020

Uh oh!

rossbar commented Mar 7, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Mar 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

rossbar commented Mar 4, 2020 •

edited

Loading

rossbar Mar 4, 2020 •

edited

Loading

eric-wieser Mar 9, 2020 •

edited

Loading

seberg commented Mar 10, 2020 •

edited

Loading