MAINT: Refactor partial load workaround for Clang #24461

seiko2plus · 2023-08-19T23:33:40Z

Clang exhibits aggressive optimization behavior when the -ftrapping-math flag is not fully supported,
starting from -O1 optimization level. When partially loading a vector register for operations that
require filling up the remaining lanes with specific values (e.g., divide operations needing non-zero
integers to prevent FP exception divide-by-zero), Clang's optimizer recognizes that the full register
is unnecessary for the store operation. Consequently, it optimizes out the fill step involving
non-zero integers for the remaining elements.

As a solution, we apply the volatile keyword to the returned vector, followed by a symmetric
operand operation like or, to inform the compiler about the necessity of the full vector.

This refactor involves transferring this workaround from the source files to the universal intrinsic headers,
also to guarantee that it is applied by all kernels. Furthermore, the workaround is disabled when the
-ftrapping-math flag is fully supported by the Clang compiler.

This patch also enables -ftrapping-math flag for clang-cl which is required to enabled SIMD optimization on operations such log/exp/sin/cos and suppress floating point exceptions warnings.

Clang exhibits aggressive optimization behavior when the `-ftrapping-math` flag is not fully supported, starting from -O1 optimization level. When partially loading a vector register for operations that require filling up the remaining lanes with specific values (e.g., divide operations needing non-zero integers to prevent FP exception divide-by-zero), Clang's optimizer recognizes that the full register is unnecessary for the store operation. Consequently, it optimizes out the fill step involving non-zero integers for the remaining elements. As a solution, we apply the `volatile` keyword to the returned register, followed by a symmetric operand operation like `or`, to inform the compiler about the necessity of the full vector. This refactor involves transferring this workaround from the source files to the universal intrinsic headers, also to guarantee that it is applied by all kernels. Furthermore, the workaround is disabled when the `-ftrapping-math` flag is fully supported by the Clang compiler. This patch also enables `-ftrapping-math` flag for clang-cl and suppress floating point exceptions warnings.

charris · 2023-09-05T16:57:41Z

Thanks Sayed.

seiko2plus force-pushed the clang_partial_bug_refactor branch 2 times, most recently from c1c965a to b3334d6 Compare August 21, 2023 13:14

seiko2plus force-pushed the clang_partial_bug_refactor branch from b3334d6 to bf5a750 Compare September 4, 2023 02:48

seiko2plus added component: SIMD Issues in SIMD (fast instruction sets) code or machinery 09 - Backport-Candidate PRs tagged should be backported labels Sep 4, 2023

seiko2plus force-pushed the clang_partial_bug_refactor branch from bf5a750 to 83cec53 Compare September 4, 2023 04:01

seiko2plus marked this pull request as ready for review September 4, 2023 04:25

charris merged commit b9c4023 into numpy:main Sep 5, 2023

charris changed the title ~~SIMD: Refactor partial load workaround for Clang~~ MAINT: Refactor partial load workaround for Clang Sep 5, 2023

charris added the 03 - Maintenance label Sep 5, 2023

charris mentioned this pull request Sep 5, 2023

MAINT: Refactor partial load Workaround for Clang #24648

Merged

charris removed the 09 - Backport-Candidate PRs tagged should be backported label Sep 5, 2023

seiko2plus mentioned this pull request Sep 14, 2023

TYP: Add annotations for the py3.12 buffer protocol #24705

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MAINT: Refactor partial load workaround for Clang #24461

MAINT: Refactor partial load workaround for Clang #24461

Uh oh!

seiko2plus commented Aug 19, 2023 •

edited

Loading

Uh oh!

charris commented Sep 5, 2023

Uh oh!

Uh oh!

Uh oh!

MAINT: Refactor partial load workaround for Clang #24461

MAINT: Refactor partial load workaround for Clang #24461

Uh oh!

Conversation

seiko2plus commented Aug 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charris commented Sep 5, 2023

Uh oh!

Uh oh!

seiko2plus commented Aug 19, 2023 •

edited

Loading