Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Clang] Pass correct lane mask for match helpers #138693

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 7, 2025
Merged

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented May 6, 2025

Summary:
We use the ballot to get the proper lane mask after we've masked off the
threads already done. This isn't an issue on AMDGPU but could cause
problems for post-Volta since it's saying that threads are active when
they aren't.

Summary:
We use the ballot to get the proper lane mask after we've masked off the
threads already done. This isn't an issue on AMDGPU but could cause
problems for post-Volta since it's saying that threads are active when
they aren't.
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 clang:headers Headers provided by Clang, e.g. for intrinsics labels May 6, 2025
@llvmbot
Copy link
Member

llvmbot commented May 6, 2025

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
We use the ballot to get the proper lane mask after we've masked off the
threads already done. This isn't an issue on AMDGPU but could cause
problems for post-Volta since it's saying that threads are active when
they aren't.


Full diff: https://github.com/llvm/llvm-project/pull/138693.diff

1 Files Affected:

  • (modified) clang/lib/Headers/gpuintrin.h (+6-4)
diff --git a/clang/lib/Headers/gpuintrin.h b/clang/lib/Headers/gpuintrin.h
index d308cc959be84..7afc82413996b 100644
--- a/clang/lib/Headers/gpuintrin.h
+++ b/clang/lib/Headers/gpuintrin.h
@@ -264,9 +264,10 @@ __gpu_match_any_u32_impl(uint64_t __lane_mask, uint32_t __x) {
   uint64_t __match_mask = 0;
 
   bool __done = 0;
-  while (__gpu_ballot(__lane_mask, !__done)) {
+  for (uint64_t __active_mask = __lane_mask; __active_mask;
+       __active_mask = __gpu_ballot(__lane_mask, !__done)) {
     if (!__done) {
-      uint32_t __first = __gpu_read_first_lane_u32(__lane_mask, __x);
+      uint32_t __first = __gpu_read_first_lane_u32(__active_mask, __x);
       if (__first == __x) {
         __match_mask = __gpu_lane_mask();
         __done = 1;
@@ -283,9 +284,10 @@ __gpu_match_any_u64_impl(uint64_t __lane_mask, uint64_t __x) {
   uint64_t __match_mask = 0;
 
   bool __done = 0;
-  while (__gpu_ballot(__lane_mask, !__done)) {
+  for (uint64_t __active_mask = __lane_mask; __active_mask;
+       __active_mask = __gpu_ballot(__lane_mask, !__done)) {
     if (!__done) {
-      uint64_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
+      uint64_t __first = __gpu_read_first_lane_u64(__active_mask, __x);
       if (__first == __x) {
         __match_mask = __gpu_lane_mask();
         __done = 1;

@jhuber6 jhuber6 merged commit 05d6734 into llvm:main May 7, 2025
15 checks passed
@jhuber6 jhuber6 deleted the Mask branch May 7, 2025 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 clang:headers Headers provided by Clang, e.g. for intrinsics clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants