-
Notifications
You must be signed in to change notification settings - Fork 62
Enable fp16 nonnative support for dynamic dispatch, make more ergonomic for static dispatch #200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The benchmark numbers on TGL shows significant improvement for _Float16
and no regression on any other datatype.
src/avx512-16bit-qsort.hpp
Outdated
} | ||
replace_inf_with_nan(arr, arrsize, nan_count, descending); | ||
replace_inf_with_nan_fp16( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we continue to use replace_inf_with_nan
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only difference between the two functions seems to be 0x7c01
v/s 0xFFFF
and 0xFFFF
seems to fail tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got rid of replace_inf_with_nan_fp16
by modifying replace_inf_with_nan
to use 0x7c01 instead of 0xFFFF.
@@ -208,6 +208,9 @@ jobs: | |||
|
|||
- name: Run test suite on SPR | |||
run: sde -spr -- ./builddir/testexe | |||
- name: Run ICL fp16 tests | |||
# Note: This filters for the _Float16 tests based on the number assigned to it, which could change in the future | |||
run: sde -icx -- ./builddir/testexe --gtest_filter="*/simdsort/2*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The np-multiarray-tgl
job does test the float16 portion of the code on a TGL, but this is fine too.
7547df8
to
2c39de4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @sterrettm2 !
This patch enabled the non-avx512fp16 _Float16 sorting to be used by the dynamic dispatch logic, as well as integrating it better into the static dispatch logic.
It is vastly faster than scalar, but a fair bit slower then the dedicated avx512fp16 code.
Comparison to scalar
Comparison to AVX512_FP16