-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: Speed up trim_zeros #16911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Speed up trim_zeros #16911
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't realized that trim_zeros
wasn't written for arrays at all.
I worry though that we can't change it, else code like trim_zeros([[1], 0])
will stop working.
Yeah, it's most definitely a bit of an oddball.
Object arrays should work fine as long as they can be either converted into boolean arrays (such as is the case with your example). In [1]: import numpy as np
In [2]: np.array([[1], 0], dtype=object).astype(bool)
Out[2]: array([ True, False]) The only exception I can think of is a (ragged) object array consisting of other arrays, In [1]: import numpy as np
In [2]: a = np.empty(2, dtype=object)
In [3]: a[0] = a[1] = np.random.rand(5,)
In [4]: a.astype(bool)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
...
ValueError: setting an array element with a sequence.``` |
Circleci failure seems to be caused by the unrelated |
Right, but your example code only works because you explicitly added >>> np.trim_zeros([[1], 0])
[[1]] and this fails outright: >>> np.trim_zeros([[1]])
[[1]] |
The first scenario can, I imagine, be resolved with a The second one is definetly more tricky, though to be fair the (pre-existing) documentation does explicitly state that |
As silly as |
Indeed, the example execution times I provided are for a pretty large array; Fortunetly, even arrays as small as |
On second thought, I suspect this is moreso related to the old implementation's lack of |
After giving it a bit more thought there might be a very simple way of dealing with this issue: While I feel raising an error would generally speaking be more appropriate, |
Perhaps the thing to do is something like:
Another more cynical view would be just deprecate this function entirely as it's trivial slow and unlike most of numpy, and work out a better name / submodule / PyPI package for the fast |
I just added something like this in a6f9d29.
If the option was just between leaving it in and removing it then I would agree with this view, |
Just to note: the current CI failures are unrelated. A rebase on master should fix them. |
New deprecations require
|
Note that this breaks astropy [1] as we can compare Quantities to 0 but cannot turn them into bool (not clear that '0 m' should be
|
@mhvk I get about that same times for both methods in master.
Wonder what's up with that. Do you compile your own NumPy? |
That's weird! I was running in a virtual environment with numpy installed from source (as I was triaging the astropy failure), but checking my vanilla debian testing numpy 1.18.4, I get essentially the same timings as those I posted. (For both, python 3.8.5, compilation likely gcc 9.3, not sure what else would be informative.) |
p.s. It is probably handier if @BvB93 makes a PR, though in principle I can do it too. Note that my suggestion does change test results for the |
Not that it matters, but I missed that this never tested the case where no stripping occurs (i.e. I was not clear enough as to what I meant before). I think basically the old code was optimized for that, while the new code will cast the full array even though almost nothing happens. |
Python 3.8.5 here also, gcc 10.2.1. For some reason my 2013 I5 setup beats a lot of modern machines. Apparently Falcon Northwest is as good as they claim :) Anyway CFLAGS are
in case it matters. |
My timings go down to 75 us and 240 us if I plug in power... I tried my office computer too (not very new Xeon, same Debian testing), and it gives 108, 257 us. Suggests perhaps some compiler/chipset optimization done for one case but not the other. @seberg - the ideal solution might be to use the |
I'm also seeing a noticable speedup here when running the new benchmark, I'll create a follow up pull request in a bit, as @mhvk suggested. Before>>> python runtests.py --bench bench_trim_zeros
...
[100.00%] ··· ===================== ============ ========== ==========
-- size
--------------------- ----------------------------------
dtype 3000 30000 300000
===================== ============ ========== ==========
dtype('int64') 9.05±1μs 35.8±5μs 315±40μs
dtype('float64') 15.1±0.5μs 56.1±6μs 490±90μs
dtype('complex128') 20.9±3μs 86.3±7μs 891±60μs
dtype('bool') 10.5±0.6μs 28.4±1μs 205±5μs
===================== ============ ========== ========== After[100.00%] ··· ===================== ============ ========== ===========
-- size
--------------------- -----------------------------------
dtype 3000 30000 300000
===================== ============ ========== ===========
dtype('int64') 6.24±0.7μs 22.8±2μs 174±7μs
dtype('float64') 11.9±1μs 40.2±4μs 321±40μs
dtype('complex128') 15.3±2μs 77.6±9μs 837±100μs
dtype('bool') 15.0±2μs 63.6±6μs 537±30μs
===================== ============ ========== =========== |
I just created a followup at #17058. |
Performance usually isn't in itself a sufficient motivation for a behavior change! |
@mhvk sorry, it only sunk in now that you said its not clear that
Its curious how comparison with 0 is special cased, but |
@eric-wieser - but performance improvement plus not changing behaviour compared to released code is surely nice! @seberg - agreed that allowing comparison with 0 is also not so obvious. And indeed we do a full unit check if the 0 has a unit attached. Only 0 without unit is special-cased (as are |
@mhvk I suppose my point would be that if you can argue about @eric-wieser actually, |
Ah, I'd missed that we already made the behavior change. |
Agreed, thruthiness could be defined -- I guess it boils down to never having had the need... And it remains tricky anyway, with always the option to go for |
True, in any case, the discussion is a bit moot, since we would be discussing changing the behaviour away from the current |
After seeing Eric's comment about Any thoughts? arr_any = np.asanyarray(...)
try:
arr = arr_any.astype(bool, copy=False)
except (TypeError, ValueError):
arr = arr_any != np.zeros((), arr_any.dtype) |
Lets not be tempted to a try/except, its too hard to grasp all the possible ways that can go wrong. The current (1.19.x) behaviour is to use That behaviour is well defined (maybe weird, but seems also not super bad). We could discuss moving the function to mean But for starters, I mostly see an unintended behaviour change, and whether that is actually better behaviour can be discussed, but it would be good to have use-cases. |
In that case I'd say it would be the simplest to revert to the old 1.19 The main purpose of this pull request was to provide a speed up to |
@BvB93 - thanks for the new PR. Don't blame you for not expecting the break - it is only because when implementing |
BUG: revert trim_zeros changes from gh-16911
Closes #16783.
As was noted in aforementioned issue the current
trim_zeros()
implementation,as of 1.19.0, is very slow with plenty of room for further optimization.
This pull request addresses the previous optimization issue.
Before
After
To summarize the new implementation: the passed array is now first converted a boolean array,
after which
argmax()
is used for identifying the first and/or last non-zero element.A side effect of the new approach is that it will trim any leading and/or trailing elementswhich evaluates to
False
; not just0
.Lastly, a new benchmark has been added as was requested in #16783 (comment).