Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MAINT: cleanup of fast_loop_macros.h #13208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 31, 2019
Merged

Conversation

qwhelan
Copy link
Contributor

@qwhelan qwhelan commented Mar 29, 2019

This is a followup to #12988 and incorporates comments suggested there:

  • OUTPUT_LOOP_FAST now takes the val to output rather than *out = val
    • The two call sites in loops.c.src have been updated
  • Whitespace between macros inserted
  • tout * out has been changed to tout *out in all cases
  • All macros have had their indentation fixed

cc @charris

* combine with NPY_GCC_OPT_3 to allow autovectorization
* should only be used where its worthwhile to avoid code bloat
*/
#define BASE_OUTPUT_LOOP(tout, op) \
#define BASE_OUTPUT_LOOP(tout, val) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're touching this - what are your thoughts on renaming it to UNARY_CONSTANT_LOOP? This is describing (arg) -> const, I'd expect OUTPUT_LOOP to describe () -> result, possibly used in something like random if that ever becomes a ufunc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me - are you thinking s/OUTPUT_/UNARY_CONSTANT_/ for everything in these two files?

@@ -652,7 +652,7 @@ BOOL__ones_like(char **args, npy_intp *dimensions, npy_intp *steps, void *NPY_UN
NPY_NO_EXPORT void
BOOL_@kind@(char **args, npy_intp *dimensions, npy_intp *steps, void *NPY_UNUSED(func))
{
OUTPUT_LOOP_FAST(npy_bool, *out = @val@);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious - is it any slower is we just use UNARY_LOOP_FAST(npy_bool, npy_bool, *out=@val@) here? Is the compiler able to remove the unused iteration over the input?

If it is, I'd be inclined to maintain fewer macros.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dug up the benchmark I wrote and quickly tried out your suggestion. I'll need to check my code in the morning but it looks like it causes a substantial regression:

$ asv compare upstream/master HEAD --only-changed
       before           after         ratio
     [db5fcc8e]       [0c76cb6e]
     <maximum_speedup~1>       <bool_perf>
+         177±3μs          234±2μs     1.32  bench_ufunc.IsNan.time_isnan('float16')
+     2.99±0.06μs       32.8±0.2μs    10.96  bench_ufunc.IsNan.time_isnan('int16')
+      3.09±0.5μs         33.1±1μs    10.72  bench_ufunc.IsNan.time_isnan('int32')
+     3.07±0.07μs       33.2±0.3μs    10.80  bench_ufunc.IsNan.time_isnan('int64')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good news - there was a typo in what I tested last night and there's no performance impact if we just use UNARY_LOOP_FAST instead. I've pushed a commit that uses that and removes the OUTPUT_LOOP_FAST macros.

/* PR80198 again, scalar works without the pragma */
#define BASE_BINARY_LOOP_S_INP(tin, tout, cin, cinp, vin, vinp, op) \
const tin cin = *(tin *)cinp; \
BINARY_LOOP { \
const tin vin = *(tin *)vinp; \
tout * out = (tout *)vinp; \
tout *out = (tout *)vinp; \
op; \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suggesting replacing op; by *out = val so that the call passes @val@ instead of the assignment statement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but that was in output_loop which is now gone completely

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, although I might need to use a local difftool to check that only indents changed

@qwhelan
Copy link
Contributor Author

qwhelan commented Mar 30, 2019

@eric-wieser One thing to note is that UNARY_LOOP_FAST declares in, which is unused in the output mode and thus generates unused variable warnings from gcc. This is the cause of the Travis failure and I'm not sure on the preferred way to suppress or resolve that warning.

@@ -896,7 +896,7 @@ NPY_NO_EXPORT void
NPY_NO_EXPORT void
@TYPE@_@kind@(char **args, npy_intp *dimensions, npy_intp *steps, void *NPY_UNUSED(func))
{
OUTPUT_LOOP_FAST(npy_bool, *out = @val@);
UNARY_LOOP_FAST(@type@, npy_bool, *out = @val@);
Copy link
Member

@eric-wieser eric-wieser Mar 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does changing this to (void)in; *out = @val@ do the trick?, without costing performance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks promising - no warnings locally and no performance regression. Just rebased and pushed, so will see what Travis has to say

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding a comment explaining the (void) - that the macro provides an in variable, and we need to tell the compiler we are deliberately ignoring it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mattip mattip merged commit a8eca5c into numpy:master Mar 31, 2019
@mattip
Copy link
Member

mattip commented Mar 31, 2019

Thanks @qwhelan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants