-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: Create boolean and integer ufuncs for isnan, isinf, and isfinite. #12988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This has come up before I think, but in the wider context of also providing integer specializations. I don't remember if there were objections, or if the pr just stalled. I'll comment with links when I find the PR (s?) I'm thinking of |
@eric-wieser Thanks, that would be appreciated if you're able to locate any prior discussions. For additional context, here's a trivial example of how this manifests to a user (especially in a library like import numpy as np
bools = np.full(10**6, 1, bool)
ints = np.full(10**6, 1, int)
%timeit np.isnan(bools)
6.33 ms ± 195 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.isnan(ints)
864 µs ± 28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) So any user reliant on |
I'm not sure I understand how a user would nan-mask bool or int data. Either way, it surprises me that the ints are faster than the bools. |
Ok, this is the second time I've tried and failed to find that PR, so I think I'm going to give up. You should be able to reuse your If you want to extend this PR to that, one thing to watch out for is that these functions need to not return true on If you want to avoid this pain, you could at least add loops for |
@eric-wieser A common scenario would be using bools = pd.Series(np.full(10**6, 1, bool))
ints = pd.Series(np.full(10**6, 1, int))
%timeit bools.all()
7.63 ms ± 88.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit ints.all()
2.7 ms ± 20.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.all(bools.to_numpy())
51.4 µs ± 695 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit np.all(ints.to_numpy())
747 µs ± 46.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) So pretty painful overhead from My PR pandas-dev/pandas#25070 would try and fix the |
Needs a release note. |
Thanks for adding the integer loops. Can you check that the following passes? @pytest.mark.parametrize('nat', [np.datetime64('nat'), np.timedelta64('nat')])
def test_nat_is_not_finite(self, nat):
try:
assert not np.isfinite(nat)
except TypeError:
pass # ok, just not implemented
@pytest.mark.parametrize('nat', [np.datetime64('nat'), np.timedelta64('nat')])
def test_nat_is_nan(self, nat):
try:
assert np.isnan(nat)
except TypeError:
pass # ok, just not implemented
@pytest.mark.parametrize('nat', [np.datetime64('nat'), np.timedelta64('nat')])
def test_nat_is_no_tinf(self, nat):
try:
assert not np.isinf(nat)
except TypeError:
pass # ok, just not implemented |
I'll build this branch locally and play around with it. Perhaps the failure mode I'm expecting doesn't exist, which would be great! |
Needs a rebase - the new macros should move to |
f704308
to
7875f3c
Compare
@eric-wieser Rebased and tests passing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Played around with this locally, and realized that only int -> time
casting is allowed, not vice versa. This all looks great.
I do wonder if the benchmark is worth including though - will leave that decision to other reviewers.
Previously, boolean values would be routed through the half implementations of these functions, which added considerable overhead. Creating specialized ufuncs improves performance by ~250x Additionally, enable autovectorization of new isnan, isinf, and isfinite ufuncs.
@eric-wieser I've removed the benchmark from this PR. |
@charris, look good to merge? |
@charris Are there any desired changes to this approach? I have several more patches in this vein that I've held off on submitting until this is merged. |
I'll go ahead and put this in - nothing here seems controversial |
#define OUTPUT_LOOP_FAST(tout, op) \ | ||
do { \ | ||
/* condition allows compiler to optimize the generic macro */ \ | ||
if (IS_OUTPUT_CONT(tout)) { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be indented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this matches the existing macros, sadly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see #13208 - I have fixed the indentation for all macros in this file
OUTPUT_LOOP { \ | ||
tout * out = (tout *)op1; \ | ||
op; \ | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blank line between macros.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #13208
*/ | ||
#define BASE_OUTPUT_LOOP(tout, op) \ | ||
OUTPUT_LOOP { \ | ||
tout * out = (tout *)op1; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can omit the space after *
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #13208
NPY_NO_EXPORT void | ||
BOOL_@kind@(char **args, npy_intp *dimensions, npy_intp *steps, void *NPY_UNUSED(func)) | ||
{ | ||
OUTPUT_LOOP_FAST(npy_bool, *out = @val@); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just pass the value and move the assignment to the macro where out
is declared?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was entirely to preserve calling convention to be similar to the other macros - I've implemented your suggestion in #13208
Looks OK aside from some style/organization nits. The depth of the macro nesting makes it a bit hard to follow, but looks correct. The original behavior of promoting to float looks weird in truth, but no weirder than the functions being called on integer/boolean types :) |
@charris Thanks for the comments and please see #13208 for implementation Thanks @eric-wieser for merging! |
For the following sample code:
We get the following call graph from

pprof
:This PR eliminates the trip through
npy_half
by providing specializednpy_bool
implementations ofisnan
,isinf
, andisfinite
Given the limited values supported bynpy_bool
, we can trivially specify the result for all inputs. Doing so initially provided a ~20x speedup and now is closer to ~250x:The call graph is also greatly simplified:
