Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MAINT: Add a fast path to var for complex input #15696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 10, 2020

Conversation

rossbar
Copy link
Contributor

@rossbar rossbar commented Mar 4, 2020

var currently has a conditional that results in conjugate being called for the variance calculation of complex inputs.

This leg of the computation is slower than it could be if conjugate were avoided altogether. This PR avoids this computational leg for complex inputs via a type check.

Closes #15684.

Here are the results of a benchmark on my system:

On 6894bbc (i.e. pre-change):

             =========== =============
                 arr size                
              ----------- -------------
                   10        35.3±1μs  
                  100      35.3±0.05μs 
                  1000       41.1±2μs  
                 10000       72.9±5μs  
                 100000     725±300μs  
                1000000     8.21±0.9ms 
                10000000     94.0±3ms  
               100000000     898±10ms  
              =========== =============

This PR (with fast path):

             =========== =============
                 arr size                
              ----------- -------------
                   10       77.7±0.3μs 
                  100      77.6±0.08μs 
                  1000       85.7±4μs  
                 10000       111±7μs   
                 100000      339±2μs   
                1000000      6.03±1ms  
                10000000     75.4±6ms  
               100000000     756±30ms  
              =========== =============

I'm not sure adding the additional complexity is necessarily worth the speed-up, especially since the change results in a slow-down for the non-computation-constrained case (i.e. smaller arrays) - please let me know what you think.

@rossbar
Copy link
Contributor Author

rossbar commented Mar 4, 2020

Related: #13177

Comment on lines 172 to 175
return um.add(
_var(arr.real, axis, dtype, out, ddof, keepdims),
_var(arr.imag, axis, dtype, out, ddof, keepdims)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reusing out here doesn't seem safe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - will look into it and think about adding new test(s) for this extra path.

In your opinion, is the change even worth it? I'm not sure how I feel about adding extra complexity for ~20% speedup (which is very system-dependent).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you get the same speedup by adding the fast path right where conjugate is called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had originally been attempting a solution along those lines - If there were a conjmultiply ufunc for instance that would do the trick but I don't know of any such function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

um.multiply(x.real, x.real, out=x.real)
um.multiply(x.imag, x.imag, out=x.imag)
x = np.add(x.real, x.imag)  # could use out=x.real, but may be slower

Copy link
Contributor Author

@rossbar rossbar Mar 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try something along those lines - thanks for the suggestion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

um.add(um.multiply(x.real, x.real), um.multiply(x.imag, x.imag))

This results in the speedup, but unfortunately still implicitly depends on the dtype of the input being complex - it fails on the test suite for object arrays with complex numbers in them

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think special casing complex dtypes is ok, as long as you do it in the same place that the other types are specialized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback - the current approach (c049074) keeps all of the special cases based on dtypes in the same area of the code where all the other type-determination occurs.

@charris charris changed the title ENH: Adds a fast path to var for complex input MAINT: Adds a fast path to var for complex input Mar 4, 2020
@charris charris changed the title MAINT: Adds a fast path to var for complex input MAINT: Add a fast path to var for complex input Mar 4, 2020
@rossbar rossbar force-pushed the enh/var_complex_fastpath branch 2 times, most recently from da841f2 to c049074 Compare March 6, 2020 08:20
@rossbar
Copy link
Contributor Author

rossbar commented Mar 6, 2020

I've updated the PR to incorporate @eric-wieser 's feedback about keeping the conditions based on dtypes all contained in the same part of the code. The most recent version adds fast-paths for complex128 and complex64 input.

I also added two new tests: one to test _var for single precision numbers, and a second for testing _var for >2 dimensional arrays as the fast paths use a view of the orignal array that increases the number of dimensions.

Finally, since this PR is about performance, I added a benchmark to bench_core. I'm not sure whether modifying the benchmarks is encouraged, but it should at least help evaluate the PR. If it shouldn't be included, I'm happy to revert (or you can - the benchmark is added in c049074).

Thanks to @seberg for the view idea and a lot of other great input re: ufunc behavior.

Here are the current results of the benchmark on my system:

Original implementation:

              =========== =============
                 arr size                
              ----------- -------------
                   10       40.3±0.4μs 
                  100       40.2±0.9μs 
                  1000     46.8±0.03μs 
                 10000       94.0±7μs  
                 100000     585±300μs  
                1000000    8.15±0.06ms 
                10000000     91.4±2ms  
               100000000     914±8ms   
              =========== =============

Manually computing var from real & imag components a.real.var() + a.imag.var():

              =========== =============
                 arr size                
              ----------- -------------
                   10        75.0±2μs  
                  100       74.8±0.3μs 
                  1000       83.4±3μs  
                 10000       116±4μs   
                 100000      411±10μs  
                1000000    6.10±0.05ms 
                10000000    73.6±0.7ms 
               100000000     745±7ms   
              =========== =============

_var fast path (this PR):

              =========== ============
                 arr size               
              ----------- ------------
                   10      43.4±0.2μs 
                  100      44.0±0.1μs 
                  1000      50.0±1μs  
                 10000      72.5±3μs  
                 100000     286±7μs   
                1000000    5.88±0.2ms 
                10000000    66.7±3ms  
               100000000    674±4ms   
              =========== ============

xv = x.view(dtype=(nt.float64, (2,)))
um.multiply(xv, xv, out=xv)
x = um.add(xv[..., 0], xv[..., 1], out=x.real).real
elif arr.dtype.type is nt.complex64:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add clongdouble too? Perhaps add a dictionary to map the types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, one thing to take care of (and that should have a test) is that non-native byte order does not use the view, or only uses it with the correct byte order. (The identity check does not catch byte order differences. An equality check, even if a bit strange maybe, does). Not sure of the top of my head how best to write it nicest.
After my PR goes in, we could even think about adding a .real_dtype() method ;).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @eric-wieser & @seberg for the suggestions! I've made the following changes/updates:

  • Added a fast-path for complex256.
  • Re-formulated the complex type checking to use a dict (eliminates some LOC's, doesn't appear to hurt the performance)
  • Added an additional check for the byteorder - any array that has anything other than non-native byte order will be funnelled to the original "slow" path so that it doesn't hit the views.
  • Updated the tests:
    • Modified a test to check complex{64,128,256} input types
    • Added a test to verify var() gives the same result for complex arrays with both native and non-native byte order.

Let me know what you think of this iteration, and thanks again for the feedback!

@rossbar rossbar force-pushed the enh/var_complex_fastpath branch from c049074 to 7635ea6 Compare March 6, 2020 23:18
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me, small fixes. Will probably look over it briefly and merge soon if nobody has another opinion.

@seberg
Copy link
Member

seberg commented Mar 7, 2020

Can you add the benchmark as the first commit, and then squash the rest into a single commit?

@rossbar rossbar force-pushed the enh/var_complex_fastpath branch from e066eba to 26a6ba2 Compare March 7, 2020 06:41
@rossbar
Copy link
Contributor Author

rossbar commented Mar 7, 2020

Thanks for all the feedback @seberg - it should be down to two commits now: one with the updated code/tests, and a separate one for the accompanying benchmark.

_complex_to_float = {
nt.complex64 : nt.float32,
nt.complex128 : nt.float64,
nt.clongdouble: nt.longdouble
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make the keys np.dtype instances, and crucially reverse the order, then you can avoid the isnative check below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reversing is necessary so that when double and longdouble are equivalent, the latter, double, wins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the comment on ordering; particularly the second comment. Also, I thought that Python dictionaries didn't make any guarantees about ordering. Would this need to be an OrderedDict?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dictionary literals are evaluated sequentially. double and long double are considered equal on Windows, you want double to take precedence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the lack of explanation, will expand when not on Mobile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @seberg meant that viewing a big-endian complex double as a little-endian double was unsafe. As long as you match the endianness before and after the view, everything will be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue with the concrete dtype instances is that attempting to import dtype from numpy.core or the full namespace (import numpy as np) results in a circular import in numpy/core/_methods.py. Is there a way to get concrete datatype instances using only numpy.core.numerictypes?

Copy link
Member

@eric-wieser eric-wieser Mar 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.core.numerictypes.dtype? Or use np.core.multiarray.dtype, which is where it actually comes from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I don't know why I had such a hard time chasing that down 😑

@@ -189,8 +189,25 @@ def _var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False):
# Note that x may not be inexact and that we need it to be an array,
# not a scalar.
x = asanyarray(arr - arrmean)

# Complex types to -> (2,)float view for fast-path
_complex_to_float = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May as well make this global, no need to construct it every time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had considered that as well - the overhead from building the dictionary seemed negligible so I had left it in, but I can just as well move it outside the function 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move it out, if nothing else if makes var shorter and easier to read.

Comment on lines 202 to 203
elif x.dtype.type in _complex_to_float and arr.dtype.isnative:
xv = x.view(dtype=(_complex_to_float[x.dtype.type], (2,)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With my dict suggestion, this becomes

Suggested change
elif x.dtype.type in _complex_to_float and arr.dtype.isnative:
xv = x.view(dtype=(_complex_to_float[x.dtype.type], (2,)))
elif x.dtype in _complex_to_float:
xv = x.view(dtype=(_complex_to_float[x.dtype], (2,)))

rossbar added 2 commits March 9, 2020 10:39
var currently has a conditional that results in conjugate being
called for the variance calculation of complex inputs.

This leg of the computation is slow. This PR avoids this
computational leg for complex inputs via a type check.

Closes numpy#15684
@rossbar rossbar force-pushed the enh/var_complex_fastpath branch from f9f5f2c to ffe1f46 Compare March 9, 2020 18:01
@rossbar
Copy link
Contributor Author

rossbar commented Mar 9, 2020

Thanks for the additional feedback @eric-wieser, the latest commit has incorporated your suggestions:

  • Moved the definition of the complex->float mapping outside of _var and use concrete dtype instances
  • Explicitly handle the double/longdouble case for increased code clarity
  • Added non-native byteorders to the mapping and modified the corresponding type check in _var

Everything should still be formulated so that the code/tests are in one commit (c898ff3) and the benchmark is in another (ffe1f46).

Let me know if I've missed anything.

@seberg
Copy link
Member

seberg commented Mar 10, 2020

Thanks Ross and Eric, I am going to put this in. I doubt the byteswapped paths are really worth it speed wise, but it doesn't add bad complexity either and keeps things symmetric.

EDIT: Arg, missed that the benchmark commit was not first and called BENCH:, not a bit deal though.

@seberg seberg merged commit 2e91696 into numpy:master Mar 10, 2020
@eric-wieser
Copy link
Member

I doubt the byteswapped paths are really worth it speed wise

I'm starting to think they're unreachable, since the subtraction output is probably always native...

@seberg
Copy link
Member

seberg commented Mar 10, 2020

Hmmpf, true should have noticed... I guess ross will fix it up probably.

@rossbar rossbar deleted the enh/var_complex_fastpath branch March 11, 2020 04:50
rossbar added a commit to rossbar/numpy that referenced this pull request Mar 11, 2020
Removes unnecessary code introduced in numpy#15696.

Non-native byte orders were explicitly added to the fast-path
check in _var for complex numbers. However, the non-native
path is unreachable due to coercion in upstream ufuncs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: Variance of complex array
4 participants