Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Speed up trim_zeros #16783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jonashaag opened this issue Jul 8, 2020 · 9 comments · Fixed by #16911
Closed

Speed up trim_zeros #16783

jonashaag opened this issue Jul 8, 2020 · 9 comments · Fixed by #16911

Comments

@jonashaag
Copy link

a = np.hstack([
    np.zeros((100_000,)),
    np.random.uniform(size=(100_000,)),
    np.zeros((100_000,)),
])
trim_zeros(a)

Here the call to trim_zeros takes about 50ms.

Looking at the implementation of trim_zeros, it is implemented in the most obvious and unoptimized way imaginable (a for loop looking at each item separately).

I think there should be a warning in the documentation about the fact that it's entirely unoptimized and may be horrendously slow, or we should strive to improve performance.

As an implementation idea to improve performance, I prototyped a "block-wise" trim function to be used before trim_zeros:

def fast_trim_zeros(filt, trim='fb'):
    filt = trim_zeros_block(filt, trim)
    return np.trim_zeros(filt, trim)


def trim_zeros_block(filt, trim='fb', block_size=1024):
    """Trim blocks of zeros"""
    trim = trim.upper()
    first = 0
    if 'F' in trim:
        for i in range(0, len(filt), block_size):
            if np.any(filt[i:i+block_size] != 0.):
                first = i
                break
    last = len(filt)
    if 'B' in trim:
        for i in range(len(filt)-1, block_size - 1, -block_size):
            if np.any(filt[i-block_size:i] != 0.):
                last = i
                break
    return filt[first:last]

Speed of a call to fast_trim_zeros is about 2ms, so roughly 25x as fast.

@Qiyu8
Copy link
Member

Qiyu8 commented Jul 9, 2020

Good to hear that, can you provide a pull request and corresponding benchmark test case?

@jonashaag
Copy link
Author

Are you saying that I should submit a PR with benchmark code or also with the code I suggested above? If the latter, there are probably hundreds of ways to implement it and the code above is just the first thing that came to my mind; so why use exactly that code?

@BvB93
Copy link
Member

BvB93 commented Jul 9, 2020

As an implementation idea to improve performance, I prototyped a "block-wise" trim function to be used before trim_zeros

How about converting the passed object into a boolean array and then use np.argmax() to find the first/last non-zero element?
With your previously defined example array I'm seeing an increase in execution speed of ~2 orders of magnitude (398 µs versus 37 ms).

import numpy as np

def trim_zeros(filt, trim='fb'):
    a = np.asanyarray(filt, dtype=bool)
    if a.ndim != 1:
        raise ValueError('trim_zeros requires an array of exactly one dimension')

    trim_upper = trim.upper()
    len_a = len(a)
    i = j = None
    
    if 'F' in trim_upper:
        i = a.argmax()
        if not a[i]:  # i.e. all elements of `filt` evaluate to `False`
            return filt[len_a:]

    if 'B' in trim_upper:
        j = len_a - a[::-1].argmax()
        if not j:  # i.e. all elements of `filt` evaluate to `False`
            return filt[len_a:]

    return filt[i:j]

@eric-wieser
Copy link
Member

eric-wieser commented Jul 9, 2020

Does that code work without the if not ...s?

@BvB93
Copy link
Member

BvB93 commented Jul 9, 2020

Does that code work without the if not ...s?

Without the if not ... it will fail if the input array consists entirely of zeros,
in which case argmax() will always return 0 and thus the filt[i:j] == filt.

>>> import numpy as np

>>> a = np.zeros(10, dtype=bool)
>>> i = a.argmax()
>>> j = len(a) - a[::-1].argmax()

>>> print(i, j)
0 10

>>> print(np.all(a == a[i:j]))  # Uhoh, `a` is not being trimmed
True

@BvB93
Copy link
Member

BvB93 commented Jul 9, 2020

Another option is to check with np.any() right at the beginning, though this appears to be a bit slower.

@eric-wieser
Copy link
Member

in which case argmax() will always return 0

Ah, I thought it might return len(a) - 1

@BvB93
Copy link
Member

BvB93 commented Jul 11, 2020

Shall I create a pull request with the implementation as proposed above?

@BvB93
Copy link
Member

BvB93 commented Jul 20, 2020

I've just created a pull request for the issue at #16911.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants