ENH: move exp, log, frexp, ldexp to SIMD dispatching #18101

seiko2plus · 2020-12-31T18:18:37Z

The second patch in a series of pull-requests aims to facilitate the migration
process to our new SIMD interface(NPYV).

It is basically a process that focuses on getting rid of the main umath SIMD source simd.inc,
which contains almost all SIMD kernels, by splitting it into several dispatch-able sources without
changing the base code, which facilitates the review process during the move to NPYV(universal intrinsics).

In this patch, we have moved the the following raw SIMD loops to the new dispatcher:

FLOAT_exp, DOUBLE_exp
FLOAT_log, DOUBLE_log
FLOAT_frexp, DOUBLE_frexp
FLOAT_ldexp, DOUBLE_ldexp

The second patch in a series of pull-requests aims to facilitate the migration process to our new SIMD interface(NPYV). It is basically a process that focuses on getting rid of the main umath SIMD source `simd.inc`, which contains almost all SIMD kernels, by splitting it into several dispatch-able sources without changing the base code, which facilitates the review process during the move to NPYV(universal intrinsics). In this patch, we have moved the the following raw SIMD loops to the new dispatcher: - FLOAT_exp, DOUBLE_exp - FLOAT_log, DOUBLE_log - FLOAT_frexp, DOUBLE_frexp - FLOAT_ldexp, DOUBLE_ldexp

mattip · 2021-01-03T19:52:07Z

Could you benchmark the loops to make sure we did not negatively impact x86_64 performance and if possible on ARM64 as well?

It seems coverage thinks some of the refactored code is not covered. Is there a test we could add to hit that code path?

Qiyu8 · 2021-01-05T03:23:00Z

numpy/core/src/umath/loops_exponent_log.dispatch.c.src

+                     __m128i vindex,
+                     __m256d mask)
+{
+    return _mm256_mask_i32gather_pd(src, addr, vindex, mask, 8);


I'm considering reconstruct exp/log by using universal intrinsics so the performance improvement benefits other architecture, but some intrinsics such as _mm256_mask_i32gather_pd, _mm256_blendv_ps,_mm512_getmant_ps is hard to simulate or have a risk to get a slower performance than using scalar.

@Qiyu8 is this an actionable review comment on this PR? It seems like more of a general statement of intent, correct?

but some intrinsics such as _mm256_mask_i32gather_pd, _mm256_blendv_ps,_mm512_getmant_ps is hard to simulate or have a risk to get a slower performance than using scalar.

what do you mean hard to simulate? we already did it except for _mm512_getmant_ps

*_i32gather_* -> npyv_loadn_* and npyv_loadn_till_*
*_i32scatter_* -> npyv_storen_* and npyv_storen_till_*
_mm256_blend* -> npyv_select_*

have a risk to get a slower performance than using scalar.

native x86 gather and scatter operations are too expensive and the same for the emulated ones, we should use them without specializing only with large kernels. without specializing -> loops_trigonometric.dispatch.c.src, with specializing loops_unary_fp.dispatch.c.src

note: _mm256_blend* blend operations are "not" expensive almost for all SIMD extensions.

mattip · 2021-01-07T09:22:28Z

LGTM. This is a reshuffle to enable changes in the future, with no real code changes.

seiko2plus · 2021-01-07T09:40:11Z

@mattip,

Could you benchmark the loops to make sure we did not negatively impact x86_64 performance and if possible on ARM64 as well?

same as before nothing changed

It seems coverage thinks some of the refactored code is not covered. Is there a test we could add to hit that code path?

all cases are covered by tests but the issue here the coverage runs only for the highest supported SIMD target by the local machine and leaves the other paths untested.

Currently, I'm working on a lightweight execution tracer similar to opencv/7101, it will allow us to test the dispatcher paths
and also provides elapsed CPU ticks for each execution region.

github-actions bot added the 01 - Enhancement label Dec 31, 2020

seiko2plus force-pushed the ditch_simd_exp_log branch 3 times, most recently from 2fa0b60 to 6a4b53a Compare January 2, 2021 08:28

seiko2plus force-pushed the ditch_simd_exp_log branch from 6a4b53a to d57f9a5 Compare January 2, 2021 09:15

seiko2plus marked this pull request as ready for review January 2, 2021 09:16

mattip changed the title ~~ENH, SIMD: Ditching the old CPU dispatcher(Exp & Log)~~ ENH: move exp, log, frexp, ldexp to SIMD dispatching Jan 3, 2021

mattip added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Jan 3, 2021

Qiyu8 reviewed Jan 5, 2021

View reviewed changes

mattip merged commit 73fe877 into numpy:master Jan 7, 2021

seiko2plus deleted the ditch_simd_exp_log branch January 9, 2021 16:50

mattip mentioned this pull request Jan 24, 2023

ENH: Add support SLEEF for transcendental functions #23068

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: move exp, log, frexp, ldexp to SIMD dispatching #18101

ENH: move exp, log, frexp, ldexp to SIMD dispatching #18101

Uh oh!

seiko2plus commented Dec 31, 2020

Uh oh!

mattip commented Jan 3, 2021

Uh oh!

Qiyu8 Jan 5, 2021

Uh oh!

mattip Jan 7, 2021

Uh oh!

seiko2plus Jan 7, 2021 •

edited

Loading

Uh oh!

seiko2plus Jan 7, 2021

Uh oh!

mattip commented Jan 7, 2021

Uh oh!

seiko2plus commented Jan 7, 2021 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

ENH: move exp, log, frexp, ldexp to SIMD dispatching #18101

ENH: move exp, log, frexp, ldexp to SIMD dispatching #18101

Uh oh!

Conversation

seiko2plus commented Dec 31, 2020

Uh oh!

mattip commented Jan 3, 2021

Uh oh!

Qiyu8 Jan 5, 2021

Choose a reason for hiding this comment

Uh oh!

mattip Jan 7, 2021

Choose a reason for hiding this comment

Uh oh!

seiko2plus Jan 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seiko2plus Jan 7, 2021

Choose a reason for hiding this comment

Uh oh!

mattip commented Jan 7, 2021

Uh oh!

seiko2plus commented Jan 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

seiko2plus Jan 7, 2021 •

edited

Loading

seiko2plus commented Jan 7, 2021 •

edited

Loading