Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH: Speed up trim_zeros #16911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Aug 4, 2020
Merged

ENH: Speed up trim_zeros #16911

merged 25 commits into from
Aug 4, 2020

Conversation

BvB93
Copy link
Member

@BvB93 BvB93 commented Jul 20, 2020

Closes #16783.

As was noted in aforementioned issue the current trim_zeros() implementation,
as of 1.19.0, is very slow with plenty of room for further optimization.
This pull request addresses the previous optimization issue.

Before

In [1]: import numpy as np

In [2]: a = np.hstack([
   ...:     np.zeros(100_000),
   ...:     np.random.uniform(size=100_000),
   ...:     np.zeros(100_000),
   ...: ])

In [3]: np.__version__
Out[3]: '1.19.0'

In [4]: %timeit np.trim_zeros(a)
45.8 ms ± 6.38 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

After

In [3]: np.__version__
Out[3]: '1.20.0.dev0+2823c98'

In [4]: %timeit np.trim_zeros(a)
303 µs ± 15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

To summarize the new implementation: the passed array is now first converted a boolean array,
after which argmax() is used for identifying the first and/or last non-zero element.

A side effect of the new approach is that it will trim any leading and/or trailing elements
which evaluates to False; not just 0.

Lastly, a new benchmark has been added as was requested in #16783 (comment).

@BvB93 BvB93 mentioned this pull request Jul 20, 2020
Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't realized that trim_zeros wasn't written for arrays at all.

I worry though that we can't change it, else code like trim_zeros([[1], 0]) will stop working.

@BvB93
Copy link
Member Author

BvB93 commented Jul 20, 2020

I hadn't realized that trim_zeros wasn't written for arrays at all.

Yeah, it's most definitely a bit of an oddball.

I worry though that we can't change it, else code like trim_zeros([[1], 0]) will stop working.

Object arrays should work fine as long as they can be either converted into boolean arrays (such as is the case with your example).

In [1]: import numpy as np                                                                                                                             

In [2]: np.array([[1], 0], dtype=object).astype(bool)                                                                                                  
Out[2]: array([ True, False])

The only exception I can think of is a (ragged) object array consisting of other arrays,
though these didn't work with the old implementation either is that would involve array to scalar comparisons.

In [1]: import numpy as np

In [2]: a = np.empty(2, dtype=object)

In [3]: a[0] = a[1] = np.random.rand(5,)

In [4]: a.astype(bool)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
    ...
ValueError: setting an array element with a sequence.```

@BvB93
Copy link
Member Author

BvB93 commented Jul 20, 2020

Circleci failure seems to be caused by the unrelated histogram2d() documentation.

@eric-wieser
Copy link
Member

Object arrays should work fine as long as they can be either converted into boolean arrays (such as is the case with your example).

Right, but your example code only works because you explicitly added dtype=object. From what I can tell, this fails / emits a deprecation warning with your code;

>>> np.trim_zeros([[1], 0])
[[1]]

and this fails outright:

>>> np.trim_zeros([[1]])
[[1]]

@BvB93
Copy link
Member Author

BvB93 commented Jul 20, 2020

Right, but your example code only works because you explicitly added dtype=object. From what I can tell, this fails / emits a deprecation warning with your code;

The first scenario can, I imagine, be resolved with a try / except approach whenever the warning will be turned into a proper exception.

The second one is definetly more tricky, though to be fair the (pre-existing) documentation does explicitly state that filt should be an 1D array or sequence.

@seberg
Copy link
Member

seberg commented Jul 20, 2020

As silly as trim_zeros is, I think the reason it uses a for loop, is probably speed to begin with: It is probably much faster as is if you only trim a handfull of zeros (or even none). However, its plausible that you can argue it does not matter: Its like having a fast-path for np.all() when the first element is already False: It saves an arbitrary amount of time, but if you do any other operation on the array later, the speedup is probably dwarfed by that.

@BvB93
Copy link
Member Author

BvB93 commented Jul 20, 2020

As silly as trim_zeros is, I think the reason it uses a for loop, is probably speed to begin with: It is probably much faster as is if you only trim a handfull of zeros (or even none).

Indeed, the example execution times I provided are for a pretty large array;
the difference will be much smaller for when the input array is smaller.

Fortunetly, even arrays as small as np.ones(1) show a minor improvements in execution time (~4 µs (new) vs ~4.5 µs (old)).

@BvB93
Copy link
Member Author

BvB93 commented Jul 20, 2020

Fortunetly, even arrays as small as np.ones(1) show a minor improvements in execution time (~4 µs (new) vs ~4.5 µs (old)).

On second thought, I suspect this is moreso related to the old implementation's lack of enumerate() and the if i == 0.: rather than if i: comparisons

@BvB93
Copy link
Member Author

BvB93 commented Jul 20, 2020

and this fails outright:

>>> np.trim_zeros([[1]])
[[1]]

After giving it a bit more thought there might be a very simple way of dealing with this issue:
simply return the passed filt if it cannot be coerced into a 1D array (instead of raising a ValueError).

While I feel raising an error would generally speaking be more appropriate,
this alternative solution is definitely safer as far as backwards compatibility is concerned.
Maybe in combination with a deprecation warning of some sort?

@eric-wieser
Copy link
Member

Maybe in combination with a deprecation warning of some sort?

Perhaps the thing to do is something like:

if isinstance(arr, np.ndarray):
    # your optimal code
else:
   # emit deprecation warning
   # old code

Another more cynical view would be just deprecate this function entirely as it's trivial slow and unlike most of numpy, and work out a better name / submodule / PyPI package for the fast trim_zeros.

@BvB93
Copy link
Member Author

BvB93 commented Jul 20, 2020

Perhaps the thing to do is something like:

...

I just added something like this in a6f9d29.
It will now automatically fall back to the old implementation if an exception is encountered
(where the same exception may or may not be raised again, depending on its nature).

Another more cynical view would be just deprecate this function entirely as it's trivial slow and unlike most of numpy, and work out a better name / submodule / PyPI package for the fast trim_zeros.

If the option was just between leaving it in and removing it then I would agree with this view,
as in its current state It's IMO not really something that belongs in numpy's public API.
However as trim_zeros() still seems salvageable, I'm personally leaning more towards leaving it in (certainly less of a hassle).

@rossbar
Copy link
Contributor

rossbar commented Jul 24, 2020

Just to note: the current CI failures are unrelated. A rebase on master should fix them.

@mattip
Copy link
Member

mattip commented Jul 31, 2020

New deprecations require

@mhvk
Copy link
Contributor

mhvk commented Aug 11, 2020

Note that this breaks astropy [1] as we can compare Quantities to 0 but cannot turn them into bool (not clear that '0 m' should be False, after all). While it isn't really a big deal since we can override with __array_function__, I wondered if it would be OK to have a follow-up PR to change the astype(bool) to comparing with 0 -- it not only is closer to what the name promises, but also turns out to be faster (found this out while trying to see how much worse it would be!):

In [4]: a = np.hstack([np.zeros(100_000), np.random.uniform(size=100_000), np.zeros(100_000)])

In [5]: %timeit b = a.astype(bool)
288 µs ± 3.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit b = a != 0
97.6 µs ± 185 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

[1] astropy/astropy#10638

@charris
Copy link
Member

charris commented Aug 11, 2020

@mhvk I get about that same times for both methods in master.

In [2]: %timeit b = a.astype(bool)                                              
77.9 µs ± 85.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [3]: %timeit b = a != 0                                                      
70.2 µs ± 149 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Wonder what's up with that. Do you compile your own NumPy?

@mhvk
Copy link
Contributor

mhvk commented Aug 11, 2020

That's weird! I was running in a virtual environment with numpy installed from source (as I was triaging the astropy failure), but checking my vanilla debian testing numpy 1.18.4, I get essentially the same timings as those I posted. (For both, python 3.8.5, compilation likely gcc 9.3, not sure what else would be informative.)

@mhvk
Copy link
Contributor

mhvk commented Aug 11, 2020

p.s. It is probably handier if @BvB93 makes a PR, though in principle I can do it too. Note that my suggestion does change test results for the object array (no entry of which will compare True to 0) and also it will not fail one of the two deprecation test cases (the one with str). Since in both cases that just reverts to the behaviour prior to this PR, I think that is, if anything, a benefit.

@seberg
Copy link
Member

seberg commented Aug 11, 2020

Not that it matters, but I missed that this never tested the case where no stripping occurs (i.e. I was not clear enough as to what I meant before). I think basically the old code was optimized for that, while the new code will cast the full array even though almost nothing happens.

@charris
Copy link
Member

charris commented Aug 11, 2020

Python 3.8.5 here also, gcc 10.2.1. For some reason my 2013 I5 setup beats a lot of modern machines. Apparently Falcon Northwest is as good as they claim :) Anyway CFLAGS are

-Wall -Wstrict-prototypes -O3 -pipe -fomit-frame-pointer -fno-strict-aliasing -Wmaybe-uninitialized -Wdeprecated-declarations -march=native

in case it matters.

@mhvk
Copy link
Contributor

mhvk commented Aug 11, 2020

My timings go down to 75 us and 240 us if I plug in power... I tried my office computer too (not very new Xeon, same Debian testing), and it gives 108, 257 us. Suggests perhaps some compiler/chipset optimization done for one case but not the other.

@seberg - the ideal solution might be to use the .first reduction type ufunc method that @ahaldane suggested a long time ago: #8528 (comment)

@BvB93
Copy link
Member Author

BvB93 commented Aug 11, 2020

I'm also seeing a noticable speedup here when running the new benchmark,
which in and of itself would, in my opinion, make this a worthwhile change.

I'll create a follow up pull request in a bit, as @mhvk suggested.

Before

>>> python runtests.py --bench bench_trim_zeros
...
[100.00%] ··· ===================== ============ ========== ==========
              --                                   size               
              --------------------- ----------------------------------
                      dtype             3000       30000      300000  
              ===================== ============ ========== ==========
                  dtype('int64')      9.05±1μs    35.8±5μs   315±40μs 
                 dtype('float64')    15.1±0.5μs   56.1±6μs   490±90μs 
               dtype('complex128')    20.9±3μs    86.3±7μs   891±60μs 
                  dtype('bool')      10.5±0.6μs   28.4±1μs   205±5μs  
              ===================== ============ ========== ==========

After

[100.00%] ··· ===================== ============ ========== ===========
              --                                    size               
              --------------------- -----------------------------------
                      dtype             3000       30000       300000  
              ===================== ============ ========== ===========
                  dtype('int64')     6.24±0.7μs   22.8±2μs    174±7μs  
                 dtype('float64')     11.9±1μs    40.2±4μs    321±40μs 
               dtype('complex128')    15.3±2μs    77.6±9μs   837±100μs 
                  dtype('bool')       15.0±2μs    63.6±6μs    537±30μs 
              ===================== ============ ========== ===========

@BvB93
Copy link
Member Author

BvB93 commented Aug 11, 2020

I just created a followup at #17058.

@eric-wieser
Copy link
Member

I'm also seeing a noticable speedup here when running the new benchmark,
which in and of itself would, in my opinion, make this a worthwhile change.

Performance usually isn't in itself a sufficient motivation for a behavior change!

@seberg
Copy link
Member

seberg commented Aug 11, 2020

@mhvk sorry, it only sunk in now that you said its not clear that 0 m should be considered False when casting to bool. Is there any particular reason for that? That basically means a quantity (scalar), does not define bool(0 m) at all (I see that this is specfically written like that).
I am a bit surprised by that, although I guess there are a few tricky points where it is indeed not clearly defined (e.g. temperature could only use absolute 0). So I guess the reason for the choice is that it does not work for units that scale multiplicative? (I can't think of what that is called).
Its interesting, because in that case I would assume comparison with 0 should also fail:

c = Quantity(0, "Celsius")
print(c == 0)  # True
c = Quantity(1, "Celsius")
print(c == 1)  # False

Its curious how comparison with 0 is special cased, but .astype(bool) discouraged, on first sight I would almost expect the opposite quantity == 0 is always False or an error, but bool(quantity) works for meters (but maybe not Celsius).

@mhvk
Copy link
Contributor

mhvk commented Aug 11, 2020

@eric-wieser - but performance improvement plus not changing behaviour compared to released code is surely nice!

@seberg - agreed that allowing comparison with 0 is also not so obvious. And indeed we do a full unit check if the 0 has a unit attached. Only 0 without unit is special-cased (as are inf and nan), in allowing them to have an arbitrary unit. It is mostly because for most units, q = 0 or, e.g., q > 0 has very obvious meaning (and, frankly, because originally it meant a few numpy functions such as np.sinc -- which does np.where(x==0, ...) 'just worked' - see astropy/astropy#1254). I guess for me it boils down to thinking of quantities as often representing relative measurements, in which case it becomes possible to see how a quantity can represent "sameness" but still remains tricky to infer any "truthiness".

@seberg
Copy link
Member

seberg commented Aug 11, 2020

@mhvk I suppose my point would be that if you can argue about == 0 being useful, then it seems that in those cases thruthiness may also be usefully defined? I agree that it probably is a swamp though.

@eric-wieser actually, == 0 seems like the smaller behaviour change... None == 0 is False and that was used previously, nonzero is the possibly the more reasonable behaviour, but probably the behaviour switch was the original PR, this actually reverts it back to the previous behaviour!

@eric-wieser
Copy link
Member

Ah, I'd missed that we already made the behavior change.

@mhvk
Copy link
Contributor

mhvk commented Aug 11, 2020

Agreed, thruthiness could be defined -- I guess it boils down to never having had the need... And it remains tricky anyway, with always the option to go for q.size > 0, in analogy with a list. But this is unsolved for numpy as well - I wish bool(array) was actually defined, at least for boolean arrays....

@seberg
Copy link
Member

seberg commented Aug 11, 2020

True, in any case, the discussion is a bit moot, since we would be discussing changing the behaviour away from the current == 0 and I am not sure there is a reason for that now. Yeah, since there is no clear quantity scalar, there is no clear way to resolve the truthiness issue (just as we have no clear numpy scalars, even though we have scalars ;)).

@BvB93
Copy link
Member Author

BvB93 commented Aug 11, 2020

After seeing Eric's comment about np.count_nonzero() (#17058 (comment)) it might actually be possible to kill two birds with one stone here. Adding the lines below would add support for more dtypes (character & void) and fix the astropy.Quantity issue all in one go.

Any thoughts?

arr_any = np.asanyarray(...)
try:
    arr = arr_any.astype(bool, copy=False)
except (TypeError, ValueError):
    arr = arr_any != np.zeros((), arr_any.dtype)

@seberg
Copy link
Member

seberg commented Aug 11, 2020

Lets not be tempted to a try/except, its too hard to grasp all the possible ways that can go wrong.

The current (1.19.x) behaviour is to use element == 0., the new behaviour effectively uses bool(element). That is a change in behaviour which trips astropy – for whatever reasons.

That behaviour is well defined (maybe weird, but seems also not super bad). We could discuss moving the function to mean stip_falsy, in which case I am actually not sure breaking astropy matters. If it does it would be an argument against changing the meaning. But trying to do some in-between thing seems very muddy waters to me.

But for starters, I mostly see an unintended behaviour change, and whether that is actually better behaviour can be discussed, but it would be good to have use-cases.

@BvB93
Copy link
Member Author

BvB93 commented Aug 13, 2020

In that case I'd say it would be the simplest to revert to the old 1.19 element == 0 behavior.

The main purpose of this pull request was to provide a speed up to trim_zeros() and, quite frankly, I did not expect switching to the bool(element) approach could realistically break pre-existing pieces of code. But it did.

@mhvk
Copy link
Contributor

mhvk commented Aug 14, 2020

@BvB93 - thanks for the new PR. Don't blame you for not expecting the break - it is only because when implementing __array_function__ I wanted to be complete that it is even tested... But I guess one often learns something new: I really had not expected the comparison to be even equally fast, let alone faster!

mattip added a commit to mattip/numpy that referenced this pull request Aug 27, 2020
seberg added a commit that referenced this pull request Aug 27, 2020
BUG: revert trim_zeros changes from gh-16911
@charris charris mentioned this pull request Oct 10, 2020
20 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speed up trim_zeros
9 participants