ENH: Speed up trim_zeros #16911

BvB93 · 2020-07-20T11:48:55Z

As was noted in aforementioned issue the current trim_zeros() implementation,
as of 1.19.0, is very slow with plenty of room for further optimization.
This pull request addresses the previous optimization issue.

Before

In [1]: import numpy as np

In [2]: a = np.hstack([
   ...:     np.zeros(100_000),
   ...:     np.random.uniform(size=100_000),
   ...:     np.zeros(100_000),
   ...: ])

In [3]: np.__version__
Out[3]: '1.19.0'

In [4]: %timeit np.trim_zeros(a)
45.8 ms ± 6.38 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

After

In [3]: np.__version__
Out[3]: '1.20.0.dev0+2823c98'

In [4]: %timeit np.trim_zeros(a)
303 µs ± 15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

To summarize the new implementation: the passed array is now first converted a boolean array,
after which argmax() is used for identifying the first and/or last non-zero element.

A side effect of the new approach is that it will trim any leading and/or trailing elements
which evaluates to False; not just 0.

Lastly, a new benchmark has been added as was requested in #16783 (comment).

numpy/lib/function_base.py

eric-wieser

I hadn't realized that trim_zeros wasn't written for arrays at all.

I worry though that we can't change it, else code like trim_zeros([[1], 0]) will stop working.

numpy/lib/function_base.py

BvB93 · 2020-07-20T12:35:23Z

I hadn't realized that trim_zeros wasn't written for arrays at all.

Yeah, it's most definitely a bit of an oddball.

I worry though that we can't change it, else code like trim_zeros([[1], 0]) will stop working.

Object arrays should work fine as long as they can be either converted into boolean arrays (such as is the case with your example).

In [1]: import numpy as np                                                                                                                             

In [2]: np.array([[1], 0], dtype=object).astype(bool)                                                                                                  
Out[2]: array([ True, False])

The only exception I can think of is a (ragged) object array consisting of other arrays,
though these didn't work with the old implementation either is that would involve array to scalar comparisons.

In [1]: import numpy as np

In [2]: a = np.empty(2, dtype=object)

In [3]: a[0] = a[1] = np.random.rand(5,)

In [4]: a.astype(bool)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
    ...
ValueError: setting an array element with a sequence.```

BvB93 · 2020-07-20T12:48:52Z

Circleci failure seems to be caused by the unrelated histogram2d() documentation.

eric-wieser · 2020-07-20T12:56:30Z

Object arrays should work fine as long as they can be either converted into boolean arrays (such as is the case with your example).

Right, but your example code only works because you explicitly added dtype=object. From what I can tell, this fails / emits a deprecation warning with your code;

>>> np.trim_zeros([[1], 0])
[[1]]

and this fails outright:

>>> np.trim_zeros([[1]])
[[1]]

BvB93 · 2020-07-20T13:08:51Z

Right, but your example code only works because you explicitly added dtype=object. From what I can tell, this fails / emits a deprecation warning with your code;

The first scenario can, I imagine, be resolved with a try / except approach whenever the warning will be turned into a proper exception.

The second one is definetly more tricky, though to be fair the (pre-existing) documentation does explicitly state that filt should be an 1D array or sequence.

numpy/lib/function_base.py

seberg · 2020-07-20T14:32:31Z

As silly as trim_zeros is, I think the reason it uses a for loop, is probably speed to begin with: It is probably much faster ~~as is~~ if you only trim a handfull of zeros (or even none). However, its plausible that you can argue it does not matter: Its like having a fast-path for np.all() when the first element is already False: It saves an arbitrary amount of time, but if you do any other operation on the array later, the speedup is probably dwarfed by that.

BvB93 · 2020-07-20T14:54:08Z

As silly as trim_zeros is, I think the reason it uses a for loop, is probably speed to begin with: It is probably much faster ~~as is~~ if you only trim a handfull of zeros (or even none).

Indeed, the example execution times I provided are for a pretty large array;
the difference will be much smaller for when the input array is smaller.

Fortunetly, even arrays as small as np.ones(1) show a minor improvements in execution time (~4 µs (new) vs ~4.5 µs (old)).

BvB93 · 2020-07-20T14:57:46Z

Fortunetly, even arrays as small as np.ones(1) show a minor improvements in execution time (~4 µs (new) vs ~4.5 µs (old)).

On second thought, I suspect this is moreso related to the old implementation's lack of enumerate() and the if i == 0.: rather than if i: comparisons

BvB93 · 2020-07-20T17:36:19Z

and this fails outright:
>>> np.trim_zeros([[1]])
[[1]]

After giving it a bit more thought there might be a very simple way of dealing with this issue:
simply return the passed filt if it cannot be coerced into a 1D array (instead of raising a ValueError).

While I feel raising an error would generally speaking be more appropriate,
this alternative solution is definitely safer as far as backwards compatibility is concerned.
Maybe in combination with a deprecation warning of some sort?

eric-wieser · 2020-07-20T17:40:46Z

Maybe in combination with a deprecation warning of some sort?

Perhaps the thing to do is something like:

if isinstance(arr, np.ndarray):
    # your optimal code
else:
   # emit deprecation warning
   # old code

Another more cynical view would be just deprecate this function entirely as it's trivial slow and unlike most of numpy, and work out a better name / submodule / PyPI package for the fast trim_zeros.

BvB93 · 2020-07-20T21:22:14Z

Perhaps the thing to do is something like:
...

I just added something like this in a6f9d29.
It will now automatically fall back to the old implementation if an exception is encountered
(where the same exception may or may not be raised again, depending on its nature).

Another more cynical view would be just deprecate this function entirely as it's trivial slow and unlike most of numpy, and work out a better name / submodule / PyPI package for the fast trim_zeros.

If the option was just between leaving it in and removing it then I would agree with this view,
as in its current state It's IMO not really something that belongs in numpy's public API.
However as trim_zeros() still seems salvageable, I'm personally leaning more towards leaving it in (certainly less of a hassle).

numpy/lib/function_base.py

rossbar · 2020-07-24T16:43:38Z

Just to note: the current CI failures are unrelated. A rebase on master should fix them.

mattip · 2020-07-31T05:45:44Z

New deprecations require

a date/version comment like this one,
tests that the warning works in test/test_deprecations and
a release note.

mhvk · 2020-08-11T00:21:28Z

Note that this breaks astropy [1] as we can compare Quantities to 0 but cannot turn them into bool (not clear that '0 m' should be False, after all). While it isn't really a big deal since we can override with __array_function__, I wondered if it would be OK to have a follow-up PR to change the astype(bool) to comparing with 0 -- it not only is closer to what the name promises, but also turns out to be faster (found this out while trying to see how much worse it would be!):

In [4]: a = np.hstack([np.zeros(100_000), np.random.uniform(size=100_000), np.zeros(100_000)])

In [5]: %timeit b = a.astype(bool)
288 µs ± 3.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit b = a != 0
97.6 µs ± 185 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

[1] astropy/astropy#10638

charris · 2020-08-11T00:51:37Z

@mhvk I get about that same times for both methods in master.

In [2]: %timeit b = a.astype(bool)                                              
77.9 µs ± 85.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [3]: %timeit b = a != 0                                                      
70.2 µs ± 149 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Wonder what's up with that. Do you compile your own NumPy?

mhvk · 2020-08-11T01:09:28Z

That's weird! I was running in a virtual environment with numpy installed from source (as I was triaging the astropy failure), but checking my vanilla debian testing numpy 1.18.4, I get essentially the same timings as those I posted. (For both, python 3.8.5, compilation likely gcc 9.3, not sure what else would be informative.)

mhvk · 2020-08-11T01:22:48Z

p.s. It is probably handier if @BvB93 makes a PR, though in principle I can do it too. Note that my suggestion does change test results for the object array (no entry of which will compare True to 0) and also it will not fail one of the two deprecation test cases (the one with str). Since in both cases that just reverts to the behaviour prior to this PR, I think that is, if anything, a benefit.

seberg · 2020-08-11T01:26:40Z

Not that it matters, but I missed that this never tested the case where no stripping occurs (i.e. I was not clear enough as to what I meant before). I think basically the old code was optimized for that, while the new code will cast the full array even though almost nothing happens.

charris · 2020-08-11T01:26:57Z

Python 3.8.5 here also, gcc 10.2.1. For some reason my 2013 I5 setup beats a lot of modern machines. Apparently Falcon Northwest is as good as they claim :) Anyway CFLAGS are

-Wall -Wstrict-prototypes -O3 -pipe -fomit-frame-pointer -fno-strict-aliasing -Wmaybe-uninitialized -Wdeprecated-declarations -march=native

in case it matters.

mhvk · 2020-08-11T13:21:23Z

My timings go down to 75 us and 240 us if I plug in power... I tried my office computer too (not very new Xeon, same Debian testing), and it gives 108, 257 us. Suggests perhaps some compiler/chipset optimization done for one case but not the other.

@seberg - the ideal solution might be to use the .first reduction type ufunc method that @ahaldane suggested a long time ago: #8528 (comment)

BvB93 · 2020-08-11T13:42:12Z

I'm also seeing a noticable speedup here when running the new benchmark,
which in and of itself would, in my opinion, make this a worthwhile change.

I'll create a follow up pull request in a bit, as @mhvk suggested.

Before

>>> python runtests.py --bench bench_trim_zeros
...
[100.00%] ··· ===================== ============ ========== ==========
              --                                   size               
              --------------------- ----------------------------------
                      dtype             3000       30000      300000  
              ===================== ============ ========== ==========
                  dtype('int64')      9.05±1μs    35.8±5μs   315±40μs 
                 dtype('float64')    15.1±0.5μs   56.1±6μs   490±90μs 
               dtype('complex128')    20.9±3μs    86.3±7μs   891±60μs 
                  dtype('bool')      10.5±0.6μs   28.4±1μs   205±5μs  
              ===================== ============ ========== ==========

After

[100.00%] ··· ===================== ============ ========== ===========
              --                                    size               
              --------------------- -----------------------------------
                      dtype             3000       30000       300000  
              ===================== ============ ========== ===========
                  dtype('int64')     6.24±0.7μs   22.8±2μs    174±7μs  
                 dtype('float64')     11.9±1μs    40.2±4μs    321±40μs 
               dtype('complex128')    15.3±2μs    77.6±9μs   837±100μs 
                  dtype('bool')       15.0±2μs    63.6±6μs    537±30μs 
              ===================== ============ ========== ===========

BvB93 · 2020-08-11T14:34:48Z

I just created a followup at #17058.

eric-wieser · 2020-08-11T14:56:04Z

I'm also seeing a noticable speedup here when running the new benchmark,
which in and of itself would, in my opinion, make this a worthwhile change.

Performance usually isn't in itself a sufficient motivation for a behavior change!

seberg · 2020-08-11T15:09:42Z

@mhvk sorry, it only sunk in now that you said its not clear that 0 m should be considered False when casting to bool. Is there any particular reason for that? That basically means a quantity (scalar), does not define bool(0 m) at all (I see that this is specfically written like that).
I am a bit surprised by that, although I guess there are a few tricky points where it is indeed not clearly defined (e.g. temperature could only use absolute 0). So I guess the reason for the choice is that it does not work for units that scale multiplicative? (I can't think of what that is called).
Its interesting, because in that case I would assume comparison with 0 should also fail:

c = Quantity(0, "Celsius")
print(c == 0)  # True
c = Quantity(1, "Celsius")
print(c == 1)  # False

Its curious how comparison with 0 is special cased, but .astype(bool) discouraged, on first sight I would almost expect the opposite quantity == 0 is always False or an error, but bool(quantity) works for meters (but maybe not Celsius).

mhvk · 2020-08-11T15:40:07Z

@eric-wieser - but performance improvement plus not changing behaviour compared to released code is surely nice!

@seberg - agreed that allowing comparison with 0 is also not so obvious. And indeed we do a full unit check if the 0 has a unit attached. Only 0 without unit is special-cased (as are inf and nan), in allowing them to have an arbitrary unit. It is mostly because for most units, q = 0 or, e.g., q > 0 has very obvious meaning (and, frankly, because originally it meant a few numpy functions such as np.sinc -- which does np.where(x==0, ...) 'just worked' - see astropy/astropy#1254). I guess for me it boils down to thinking of quantities as often representing relative measurements, in which case it becomes possible to see how a quantity can represent "sameness" but still remains tricky to infer any "truthiness".

seberg · 2020-08-11T15:49:41Z

@mhvk I suppose my point would be that if you can argue about == 0 being useful, then it seems that in those cases thruthiness may also be usefully defined? I agree that it probably is a swamp though.

@eric-wieser actually, == 0 seems like the smaller behaviour change... None == 0 is False and that was used previously, nonzero is the possibly the more reasonable behaviour, but probably the behaviour switch was the original PR, this actually reverts it back to the previous behaviour!

eric-wieser · 2020-08-11T15:50:33Z

Ah, I'd missed that we already made the behavior change.

mhvk · 2020-08-11T16:02:56Z

Agreed, thruthiness could be defined -- I guess it boils down to never having had the need... And it remains tricky anyway, with always the option to go for q.size > 0, in analogy with a list. But this is unsolved for numpy as well - I wish bool(array) was actually defined, at least for boolean arrays....

seberg · 2020-08-11T16:05:25Z

True, in any case, the discussion is a bit moot, since we would be discussing changing the behaviour away from the current == 0 and I am not sure there is a reason for that now. Yeah, since there is no clear quantity scalar, there is no clear way to resolve the truthiness issue (just as we have no clear numpy scalars, even though we have scalars ;)).

BvB93 · 2020-08-11T17:24:09Z

After seeing Eric's comment about np.count_nonzero() (#17058 (comment)) it might actually be possible to kill two birds with one stone here. Adding the lines below would add support for more dtypes (character & void) and fix the astropy.Quantity issue all in one go.

Any thoughts?

arr_any = np.asanyarray(...)
try:
    arr = arr_any.astype(bool, copy=False)
except (TypeError, ValueError):
    arr = arr_any != np.zeros((), arr_any.dtype)

seberg · 2020-08-11T17:41:04Z

Lets not be tempted to a try/except, its too hard to grasp all the possible ways that can go wrong.

The current (1.19.x) behaviour is to use element == 0., the new behaviour effectively uses bool(element). That is a change in behaviour which trips astropy – for whatever reasons.

That behaviour is well defined (maybe weird, but seems also not super bad). We could discuss moving the function to mean stip_falsy, in which case I am actually not sure breaking astropy matters. If it does it would be an argument against changing the meaning. But trying to do some in-between thing seems very muddy waters to me.

But for starters, I mostly see an unintended behaviour change, and whether that is actually better behaviour can be discussed, but it would be good to have use-cases.

BvB93 · 2020-08-13T14:10:50Z

In that case I'd say it would be the simplest to revert to the old 1.19 element == 0 behavior.

The main purpose of this pull request was to provide a speed up to trim_zeros() and, quite frankly, I did not expect switching to the bool(element) approach could realistically break pre-existing pieces of code. But it did.

mhvk · 2020-08-14T00:21:59Z

@BvB93 - thanks for the new PR. Don't blame you for not expecting the break - it is only because when implementing __array_function__ I wanted to be complete that it is even tested... But I guess one often learns something new: I really had not expected the comparison to be even equally fast, let alone faster!

BUG: revert trim_zeros changes from gh-16911

BvB93 added 01 - Enhancement 28 - Benchmark labels Jul 20, 2020

BvB93 commented Jul 20, 2020

View reviewed changes

numpy/lib/function_base.py Show resolved Hide resolved

BvB93 mentioned this pull request Jul 20, 2020

Speed up trim_zeros #16783

Closed

eric-wieser reviewed Jul 20, 2020

View reviewed changes

numpy/lib/function_base.py Outdated Show resolved Hide resolved

eric-wieser reviewed Jul 20, 2020

View reviewed changes

numpy/lib/function_base.py Outdated Show resolved Hide resolved

eric-wieser reviewed Jul 20, 2020

View reviewed changes

numpy/lib/function_base.py Outdated Show resolved Hide resolved

jonashaag reviewed Jul 20, 2020

View reviewed changes

numpy/lib/function_base.py Outdated Show resolved Hide resolved

BvB93 force-pushed the trim_zeros branch from faef850 to 94ddcd2 Compare July 24, 2020 22:12

mhvk mentioned this pull request Aug 11, 2020

Test failure with Numpy dev astropy/astropy#10638

Closed

BvB93 mentioned this pull request Aug 11, 2020

MAINT: Revert boolean casting back to elementwise comparisons in trim_zeros #17058

Merged

mattip mentioned this pull request Aug 21, 2020

Wheel builds are failing on aarch64. #17126

Closed

mattip added a commit to mattip/numpy that referenced this pull request Aug 27, 2020

BUG: revert trim_zeros changes from numpygh-16911

0511984

mattip mentioned this pull request Aug 27, 2020

BUG: revert trim_zeros changes from gh-16911 #17171

Merged

seberg added a commit that referenced this pull request Aug 27, 2020

Merge pull request #17171 from mattip/revert-16911

e4e0ab8

BUG: revert trim_zeros changes from gh-16911

charris mentioned this pull request Oct 10, 2020

REL: PRs that may need release notes #17533

Closed

20 tasks

Uh oh!

ENH: Speed up trim_zeros #16911

ENH: Speed up trim_zeros #16911

Uh oh!

Conversation

BvB93 commented Jul 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before

After

Uh oh!

Uh oh!

Uh oh!

eric-wieser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

BvB93 commented Jul 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BvB93 commented Jul 20, 2020

Uh oh!

eric-wieser commented Jul 20, 2020

Uh oh!

BvB93 commented Jul 20, 2020

Uh oh!

Uh oh!

seberg commented Jul 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BvB93 commented Jul 20, 2020

Uh oh!

BvB93 commented Jul 20, 2020

Uh oh!

BvB93 commented Jul 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eric-wieser commented Jul 20, 2020

Uh oh!

BvB93 commented Jul 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

rossbar commented Jul 24, 2020

Uh oh!

mattip commented Jul 31, 2020

Uh oh!

mhvk commented Aug 11, 2020

Uh oh!

charris commented Aug 11, 2020

Uh oh!

mhvk commented Aug 11, 2020

Uh oh!

mhvk commented Aug 11, 2020

Uh oh!

seberg commented Aug 11, 2020

Uh oh!

charris commented Aug 11, 2020

Uh oh!

mhvk commented Aug 11, 2020

Uh oh!

BvB93 commented Aug 11, 2020

Before

After

Uh oh!

BvB93 commented Aug 11, 2020

Uh oh!

eric-wieser commented Aug 11, 2020

Uh oh!

seberg commented Aug 11, 2020

Uh oh!

mhvk commented Aug 11, 2020

Uh oh!

seberg commented Aug 11, 2020

Uh oh!

eric-wieser commented Aug 11, 2020

Uh oh!

mhvk commented Aug 11, 2020

Uh oh!

seberg commented Aug 11, 2020

Uh oh!

BvB93 commented Jul 20, 2020 •

edited

Loading

BvB93 commented Jul 20, 2020 •

edited

Loading

seberg commented Jul 20, 2020 •

edited

Loading

BvB93 commented Jul 20, 2020 •

edited

Loading

BvB93 commented Jul 20, 2020 •

edited

Loading

BvB93 commented Aug 11, 2020 •

edited

Loading