Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MAINT: Refactor partial load workaround for Clang #24461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 5, 2023

Conversation

seiko2plus
Copy link
Member

@seiko2plus seiko2plus commented Aug 19, 2023

Clang exhibits aggressive optimization behavior when the -ftrapping-math flag is not fully supported,
starting from -O1 optimization level. When partially loading a vector register for operations that
require filling up the remaining lanes with specific values (e.g., divide operations needing non-zero
integers to prevent FP exception divide-by-zero), Clang's optimizer recognizes that the full register
is unnecessary for the store operation. Consequently, it optimizes out the fill step involving
non-zero integers for the remaining elements.

As a solution, we apply the volatile keyword to the returned vector, followed by a symmetric
operand operation like or, to inform the compiler about the necessity of the full vector.

This refactor involves transferring this workaround from the source files to the universal intrinsic headers,
also to guarantee that it is applied by all kernels. Furthermore, the workaround is disabled when the
-ftrapping-math flag is fully supported by the Clang compiler.

This patch also enables -ftrapping-math flag for clang-cl which is required to enabled SIMD optimization on operations such log/exp/sin/cos and suppress floating point exceptions warnings.

@seiko2plus seiko2plus force-pushed the clang_partial_bug_refactor branch 2 times, most recently from c1c965a to b3334d6 Compare August 21, 2023 13:14
@seiko2plus seiko2plus force-pushed the clang_partial_bug_refactor branch from b3334d6 to bf5a750 Compare September 4, 2023 02:48
@seiko2plus seiko2plus added component: SIMD Issues in SIMD (fast instruction sets) code or machinery 09 - Backport-Candidate PRs tagged should be backported labels Sep 4, 2023
  Clang exhibits aggressive optimization behavior when the `-ftrapping-math` flag is not fully supported,
  starting from -O1 optimization level. When partially loading a vector register for operations that
  require filling up the remaining lanes with specific values (e.g., divide operations needing non-zero
  integers to prevent FP exception divide-by-zero), Clang's optimizer recognizes that the full register
  is unnecessary for the store operation. Consequently, it optimizes out the fill step involving
  non-zero integers for the remaining elements.

  As a solution, we apply the `volatile` keyword to the returned register, followed by a symmetric
  operand operation like `or`, to inform the compiler about the necessity of the full vector.

  This refactor involves transferring this workaround from the source files to the universal intrinsic headers,
  also to guarantee that it is applied by all kernels. Furthermore, the workaround is disabled when the
  `-ftrapping-math` flag is fully supported by the Clang compiler.

  This patch also enables `-ftrapping-math` flag for clang-cl and
  suppress floating point exceptions warnings.
@seiko2plus seiko2plus force-pushed the clang_partial_bug_refactor branch from bf5a750 to 83cec53 Compare September 4, 2023 04:01
@seiko2plus seiko2plus marked this pull request as ready for review September 4, 2023 04:25
@charris charris merged commit b9c4023 into numpy:main Sep 5, 2023
@charris
Copy link
Member

charris commented Sep 5, 2023

Thanks Sayed.

@charris charris changed the title SIMD: Refactor partial load workaround for Clang MAINT: Refactor partial load workaround for Clang Sep 5, 2023
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
03 - Maintenance component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants