MAINT: Refactor partial load workaround for Clang #24461
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Clang exhibits aggressive optimization behavior when the
-ftrapping-math
flag is not fully supported,starting from -O1 optimization level. When partially loading a vector register for operations that
require filling up the remaining lanes with specific values (e.g., divide operations needing non-zero
integers to prevent FP exception divide-by-zero), Clang's optimizer recognizes that the full register
is unnecessary for the store operation. Consequently, it optimizes out the fill step involving
non-zero integers for the remaining elements.
As a solution, we apply the
volatile
keyword to the returned vector, followed by a symmetricoperand operation like
or
, to inform the compiler about the necessity of the full vector.This refactor involves transferring this workaround from the source files to the universal intrinsic headers,
also to guarantee that it is applied by all kernels. Furthermore, the workaround is disabled when the
-ftrapping-math
flag is fully supported by the Clang compiler.This patch also enables
-ftrapping-math
flag for clang-cl which is required to enabled SIMD optimization on operations such log/exp/sin/cos and suppress floating point exceptions warnings.