Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH: Create boolean and integer ufuncs for isnan, isinf, and isfinite. #12988

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 29, 2019

Conversation

qwhelan
Copy link
Contributor

@qwhelan qwhelan commented Feb 19, 2019

For the following sample code:

import numpy as np
arr = np.full(10**5, 0, bool)

for i in range(10000):
    np.isnan(arr)

We get the following call graph from pprof:
old

This PR eliminates the trip through npy_half by providing specialized npy_bool implementations of isnan, isinf, and isfinite Given the limited values supported by npy_bool, we can trivially specify the result for all inputs. Doing so initially provided a ~20x speedup and now is closer to ~250x:

$ asv compare HEAD^ HEAD -s

Benchmarks that have improved:

       before           after         ratio
     [95db8c28]       [e861372b]
     <bool_ufunc~1>       <bool_ufunc>
-     1.23±0.01ms      5.03±0.09μs     0.00  bench_ufunc.IsNan.time_isnan('bool')
-        65.8±8μs       5.28±0.1μs     0.08  bench_ufunc.IsNan.time_isnan('int16')
-      87.9±0.7μs      5.32±0.05μs     0.06  bench_ufunc.IsNan.time_isnan('int32')
-         145±1μs      5.40±0.08μs     0.04  bench_ufunc.IsNan.time_isnan('int64')

Benchmarks that have stayed the same:

       before           after         ratio
     [95db8c28]       [e861372b]
     <bool_ufunc~1>       <bool_ufunc>
        119±0.6μs        118±0.4μs     1.00  bench_ufunc.IsNan.time_isnan('complex128')
          240±1μs          243±5μs     1.01  bench_ufunc.IsNan.time_isnan('complex256')
          106±1μs        105±0.3μs     0.99  bench_ufunc.IsNan.time_isnan('complex64')
         568±10μs          567±6μs     1.00  bench_ufunc.IsNan.time_isnan('float16')
         27.4±2μs      27.7±0.08μs     1.01  bench_ufunc.IsNan.time_isnan('float32')
       47.3±0.4μs       47.5±0.2μs     1.00  bench_ufunc.IsNan.time_isnan('float64')
          141±2μs        141±0.5μs     1.00  bench_ufunc.IsNan.time_isnan('longfloat')

The call graph is also greatly simplified:
fixed

@eric-wieser
Copy link
Member

This has come up before I think, but in the wider context of also providing integer specializations. I don't remember if there were objections, or if the pr just stalled. I'll comment with links when I find the PR (s?) I'm thinking of

@qwhelan
Copy link
Contributor Author

qwhelan commented Feb 19, 2019

@eric-wieser Thanks, that would be appreciated if you're able to locate any prior discussions.

For additional context, here's a trivial example of how this manifests to a user (especially in a library like pandas):

import numpy as np

bools = np.full(10**6, 1, bool)
ints = np.full(10**6, 1, int)

%timeit np.isnan(bools)
6.33 ms ± 195 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit np.isnan(ints)
864 µs ± 28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So any user reliant on NaN-masking currently sees a ~7.3x speedup by using ints instead of bools for storing bool data, which encourages use of overly broad types for performance reasons.

@eric-wieser
Copy link
Member

I'm not sure I understand how a user would nan-mask bool or int data. Either way, it surprises me that the ints are faster than the bools.

@eric-wieser
Copy link
Member

Ok, this is the second time I've tried and failed to find that PR, so I think I'm going to give up.

You should be able to reuse your isnan, isfinite, and isinf loops for all the integer types.

If you want to extend this PR to that, one thing to watch out for is that these functions need to not return true on np.datetime64('nat') and np.timedelta64('nat').

If you want to avoid this pain, you could at least add loops for bBhHiI types, and stick a TODO in about adding it for lLqQ types.

@qwhelan
Copy link
Contributor Author

qwhelan commented Feb 19, 2019

@eric-wieser A common scenario would be using pandas, which adds a NaN-mask to most functions, including all() and any():

bools = pd.Series(np.full(10**6, 1, bool))
ints = pd.Series(np.full(10**6, 1, int))

%timeit bools.all()
7.63 ms ± 88.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit ints.all()
2.7 ms ± 20.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit np.all(bools.to_numpy())
51.4 µs ± 695 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit np.all(ints.to_numpy())
747 µs ± 46.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So pretty painful overhead from NaN masking in the bool case.

My PR pandas-dev/pandas#25070 would try and fix the pandas side of things, but there's still some low-hanging fruit in the numpy case.

@charris charris changed the title PERF: create boolean ufuncs for isnan, isinf, isfinite, and signbit. ENH: Create boolean ufuncs for isnan, isinf, isfinite, and signbit. Feb 19, 2019
@charris charris added 01 - Enhancement component: numpy._core 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes labels Feb 19, 2019
@charris
Copy link
Member

charris commented Feb 19, 2019

Needs a release note.

@qwhelan qwhelan changed the title ENH: Create boolean ufuncs for isnan, isinf, isfinite, and signbit. ENH: Create boolean and integer ufuncs for isnan, isinf, and isfinite. Feb 20, 2019
@eric-wieser
Copy link
Member

Thanks for adding the integer loops.

Can you check that the following passes?

@pytest.mark.parametrize('nat', [np.datetime64('nat'), np.timedelta64('nat')])
def test_nat_is_not_finite(self, nat):
    try:
        assert not np.isfinite(nat)
    except TypeError:
        pass  # ok, just not implemented

@pytest.mark.parametrize('nat', [np.datetime64('nat'), np.timedelta64('nat')])
def test_nat_is_nan(self, nat):
    try:
        assert np.isnan(nat)
    except TypeError:
        pass  # ok, just not implemented

@pytest.mark.parametrize('nat', [np.datetime64('nat'), np.timedelta64('nat')])
def test_nat_is_no_tinf(self, nat):
    try:
        assert not np.isinf(nat)
    except TypeError:
        pass  # ok, just not implemented

@eric-wieser
Copy link
Member

I'll build this branch locally and play around with it. Perhaps the failure mode I'm expecting doesn't exist, which would be great!

@eric-wieser eric-wieser removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Feb 26, 2019
@eric-wieser
Copy link
Member

Needs a rebase - the new macros should move to fast_loop_macros.h

@qwhelan qwhelan force-pushed the bool_ufunc branch 4 times, most recently from f704308 to 7875f3c Compare February 28, 2019 00:10
@qwhelan
Copy link
Contributor Author

qwhelan commented Feb 28, 2019

@eric-wieser Rebased and tests passing

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Played around with this locally, and realized that only int -> time casting is allowed, not vice versa. This all looks great.

I do wonder if the benchmark is worth including though - will leave that decision to other reviewers.

Previously, boolean values would be routed through the half implementations of
these functions, which added considerable overhead. Creating specialized
ufuncs improves performance by ~250x

Additionally, enable autovectorization of new isnan, isinf, and isfinite ufuncs.
@qwhelan
Copy link
Contributor Author

qwhelan commented Mar 10, 2019

@eric-wieser I've removed the benchmark from this PR.

@eric-wieser
Copy link
Member

@charris, look good to merge?

@qwhelan
Copy link
Contributor Author

qwhelan commented Mar 28, 2019

@charris Are there any desired changes to this approach? I have several more patches in this vein that I've held off on submitting until this is merged.

@eric-wieser
Copy link
Member

I'll go ahead and put this in - nothing here seems controversial

@eric-wieser eric-wieser merged commit db5fcc8 into numpy:master Mar 29, 2019
#define OUTPUT_LOOP_FAST(tout, op) \
do { \
/* condition allows compiler to optimize the generic macro */ \
if (IS_OUTPUT_CONT(tout)) { \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be indented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this matches the existing macros, sadly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #13208 - I have fixed the indentation for all macros in this file

OUTPUT_LOOP { \
tout * out = (tout *)op1; \
op; \
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blank line between macros.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #13208

*/
#define BASE_OUTPUT_LOOP(tout, op) \
OUTPUT_LOOP { \
tout * out = (tout *)op1; \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can omit the space after *.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #13208

NPY_NO_EXPORT void
BOOL_@kind@(char **args, npy_intp *dimensions, npy_intp *steps, void *NPY_UNUSED(func))
{
OUTPUT_LOOP_FAST(npy_bool, *out = @val@);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just pass the value and move the assignment to the macro where out is declared?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was entirely to preserve calling convention to be similar to the other macros - I've implemented your suggestion in #13208

@charris
Copy link
Member

charris commented Mar 29, 2019

Looks OK aside from some style/organization nits. The depth of the macro nesting makes it a bit hard to follow, but looks correct. The original behavior of promoting to float looks weird in truth, but no weirder than the functions being called on integer/boolean types :)

@qwhelan
Copy link
Contributor Author

qwhelan commented Mar 29, 2019

@charris Thanks for the comments and please see #13208 for implementation

Thanks @eric-wieser for merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants