-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
MAINT: cleanup of fast_loop_macros.h #13208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3b5b208
to
ff646bf
Compare
* combine with NPY_GCC_OPT_3 to allow autovectorization | ||
* should only be used where its worthwhile to avoid code bloat | ||
*/ | ||
#define BASE_OUTPUT_LOOP(tout, op) \ | ||
#define BASE_OUTPUT_LOOP(tout, val) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're touching this - what are your thoughts on renaming it to UNARY_CONSTANT_LOOP
? This is describing (arg) -> const
, I'd expect OUTPUT_LOOP
to describe () -> result
, possibly used in something like random
if that ever becomes a ufunc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me - are you thinking s/OUTPUT_/UNARY_CONSTANT_/
for everything in these two files?
@@ -652,7 +652,7 @@ BOOL__ones_like(char **args, npy_intp *dimensions, npy_intp *steps, void *NPY_UN | |||
NPY_NO_EXPORT void | |||
BOOL_@kind@(char **args, npy_intp *dimensions, npy_intp *steps, void *NPY_UNUSED(func)) | |||
{ | |||
OUTPUT_LOOP_FAST(npy_bool, *out = @val@); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious - is it any slower is we just use UNARY_LOOP_FAST(npy_bool, npy_bool, *out=@val@)
here? Is the compiler able to remove the unused iteration over the input?
If it is, I'd be inclined to maintain fewer macros.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dug up the benchmark I wrote and quickly tried out your suggestion. I'll need to check my code in the morning but it looks like it causes a substantial regression:
$ asv compare upstream/master HEAD --only-changed
before after ratio
[db5fcc8e] [0c76cb6e]
<maximum_speedup~1> <bool_perf>
+ 177±3μs 234±2μs 1.32 bench_ufunc.IsNan.time_isnan('float16')
+ 2.99±0.06μs 32.8±0.2μs 10.96 bench_ufunc.IsNan.time_isnan('int16')
+ 3.09±0.5μs 33.1±1μs 10.72 bench_ufunc.IsNan.time_isnan('int32')
+ 3.07±0.07μs 33.2±0.3μs 10.80 bench_ufunc.IsNan.time_isnan('int64')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good news - there was a typo in what I tested last night and there's no performance impact if we just use UNARY_LOOP_FAST
instead. I've pushed a commit that uses that and removes the OUTPUT_LOOP_FAST
macros.
/* PR80198 again, scalar works without the pragma */ | ||
#define BASE_BINARY_LOOP_S_INP(tin, tout, cin, cinp, vin, vinp, op) \ | ||
const tin cin = *(tin *)cinp; \ | ||
BINARY_LOOP { \ | ||
const tin vin = *(tin *)vinp; \ | ||
tout * out = (tout *)vinp; \ | ||
tout *out = (tout *)vinp; \ | ||
op; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suggesting replacing op;
by *out = val
so that the call passes @val@
instead of the assignment statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but that was in output_loop which is now gone completely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, although I might need to use a local difftool to check that only indents changed
@eric-wieser One thing to note is that |
numpy/core/src/umath/loops.c.src
Outdated
@@ -896,7 +896,7 @@ NPY_NO_EXPORT void | |||
NPY_NO_EXPORT void | |||
@TYPE@_@kind@(char **args, npy_intp *dimensions, npy_intp *steps, void *NPY_UNUSED(func)) | |||
{ | |||
OUTPUT_LOOP_FAST(npy_bool, *out = @val@); | |||
UNARY_LOOP_FAST(@type@, npy_bool, *out = @val@); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does changing this to (void)in; *out = @val@
do the trick?, without costing performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks promising - no warnings locally and no performance regression. Just rebased and pushed, so will see what Travis has to say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth adding a comment explaining the (void)
- that the macro provides an in
variable, and we need to tell the compiler we are deliberately ignoring it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
dd3050a
to
3107b1f
Compare
3107b1f
to
838abd7
Compare
Thanks @qwhelan |
This is a followup to #12988 and incorporates comments suggested there:
OUTPUT_LOOP_FAST
now takes theval
to output rather than*out = val
loops.c.src
have been updatedtout * out
has been changed totout *out
in all casescc @charris