-
-
Notifications
You must be signed in to change notification settings - Fork 11k
ENH: move exp, log, frexp, ldexp to SIMD dispatching #18101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2fa0b60
to
6a4b53a
Compare
The second patch in a series of pull-requests aims to facilitate the migration process to our new SIMD interface(NPYV). It is basically a process that focuses on getting rid of the main umath SIMD source `simd.inc`, which contains almost all SIMD kernels, by splitting it into several dispatch-able sources without changing the base code, which facilitates the review process during the move to NPYV(universal intrinsics). In this patch, we have moved the the following raw SIMD loops to the new dispatcher: - FLOAT_exp, DOUBLE_exp - FLOAT_log, DOUBLE_log - FLOAT_frexp, DOUBLE_frexp - FLOAT_ldexp, DOUBLE_ldexp
6a4b53a
to
d57f9a5
Compare
Could you benchmark the loops to make sure we did not negatively impact x86_64 performance and if possible on ARM64 as well? It seems coverage thinks some of the refactored code is not covered. Is there a test we could add to hit that code path? |
__m128i vindex, | ||
__m256d mask) | ||
{ | ||
return _mm256_mask_i32gather_pd(src, addr, vindex, mask, 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm considering reconstruct exp/log
by using universal intrinsics so the performance improvement benefits other architecture, but some intrinsics such as _mm256_mask_i32gather_pd
, _mm256_blendv_ps
,_mm512_getmant_ps
is hard to simulate or have a risk to get a slower performance than using scalar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Qiyu8 is this an actionable review comment on this PR? It seems like more of a general statement of intent, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but some intrinsics such as _mm256_mask_i32gather_pd, _mm256_blendv_ps,_mm512_getmant_ps is hard to simulate or have a risk to get a slower performance than using scalar.
what do you mean hard to simulate? we already did it except for _mm512_getmant_ps
*_i32gather_*
-> npyv_loadn_*
and npyv_loadn_till_*
*_i32scatter_*
-> npyv_storen_*
and npyv_storen_till_*
_mm256_blend*
-> npyv_select_*
have a risk to get a slower performance than using scalar.
native x86 gather and scatter operations are too expensive and the same for the emulated ones, we should use them without specializing only with large kernels. without specializing -> loops_trigonometric.dispatch.c.src, with specializing loops_unary_fp.dispatch.c.src
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: _mm256_blend*
blend operations are "not" expensive almost for all SIMD extensions.
LGTM. This is a reshuffle to enable changes in the future, with no real code changes. |
same as before nothing changed
all cases are covered by tests but the issue here the coverage runs only for the highest supported SIMD target by the local machine and leaves the other paths untested. Currently, I'm working on a lightweight execution tracer similar to opencv/7101, it will allow us to test the dispatcher paths |
The second patch in a series of pull-requests aims to facilitate the migration
process to our new SIMD interface(NPYV).
It is basically a process that focuses on getting rid of the main umath SIMD source
simd.inc
,which contains almost all SIMD kernels, by splitting it into several dispatch-able sources without
changing the base code, which facilitates the review process during the move to NPYV(universal intrinsics).
In this patch, we have moved the the following raw SIMD loops to the new dispatcher: