[InstCombine] Do not combine shuffle+bitcast if the bitcast is eliminable. #135769

rj-jesus · 2025-04-15T09:58:47Z

If we are attempting to combine shuffle+bitcast but the bitcast is
pairable with a subsequent bitcast, we should not fold the shuffle as
doing so can block further simplifications.

The motivation for this is a long standing regression affecting SIMDe on
AArch64 introduced indirectly by the AlwaysInliner (1a2e77c). Some
reproducers:

…able. If we are attempting to combine shuffle+bitcast but the bitcast is pairable with a subsequent bitcast, we should not fold the shuffle as doing so can block further simplifications. The motivation for this is a long standing regression affecting SIMDe on AArch64 introduced indirectly by the alwaysinliner (1a2e77c). Examples of reproducers: * https://godbolt.org/z/53qx18s6M * https://godbolt.org/z/o5e43h5M7

llvmbot · 2025-04-15T09:59:24Z

@llvm/pr-subscribers-llvm-transforms

Author: Ricardo Jesus (rj-jesus)

Changes

If we are attempting to combine shuffle+bitcast but the bitcast is
pairable with a subsequent bitcast, we should not fold the shuffle as
doing so can block further simplifications.

The motivation for this is a long standing regression affecting SIMDe on
AArch64 introduced indirectly by the AlwaysInliner (1a2e77c). Some
reproducers:

Full diff: https://github.com/llvm/llvm-project/pull/135769.diff

2 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp (+12-4)
(modified) llvm/test/Transforms/InstCombine/shufflevec-bitcast.ll (+15)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
index f897cc7855d2d..f6423cb40492e 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
@@ -3029,10 +3029,18 @@ Instruction *InstCombinerImpl::visitShuffleVectorInst(ShuffleVectorInst &SVI) {
     SmallVector<BitCastInst *, 8> BCs;
     DenseMap<Type *, Value *> NewBCs;
     for (User *U : SVI.users())
-      if (BitCastInst *BC = dyn_cast<BitCastInst>(U))
-        if (!BC->use_empty())
-          // Only visit bitcasts that weren't previously handled.
-          BCs.push_back(BC);
+      if (BitCastInst *BC = dyn_cast<BitCastInst>(U)) {
+        // Only visit bitcasts that weren't previously handled.
+        if (BC->use_empty())
+          continue;
+        // Prefer to combine bitcasts of bitcasts before attempting this fold.
+        if (BC->hasOneUse()) {
+          auto *BC2 = dyn_cast<BitCastInst>(BC->user_back());
+          if (BC2 && isEliminableCastPair(BC, BC2))
+            continue;
+        }
+        BCs.push_back(BC);
+      }
     for (BitCastInst *BC : BCs) {
       unsigned BegIdx = Mask.front();
       Type *TgtTy = BC->getDestTy();
diff --git a/llvm/test/Transforms/InstCombine/shufflevec-bitcast.ll b/llvm/test/Transforms/InstCombine/shufflevec-bitcast.ll
index f20077243273c..c6152368f06fd 100644
--- a/llvm/test/Transforms/InstCombine/shufflevec-bitcast.ll
+++ b/llvm/test/Transforms/InstCombine/shufflevec-bitcast.ll
@@ -235,3 +235,18 @@ define <3 x i4> @shuf_bitcast_wrong_size(<2 x i8> %v, i8 %x) {
   %r = shufflevector <4 x i4> %b, <4 x i4> undef, <3 x i32> <i32 0, i32 1, i32 2>
   ret <3 x i4> %r
 }
+
+; Negative test - chain of bitcasts.
+
+define <16 x i8> @shuf_bitcast_chain(<8 x i32> %v) {
+; CHECK-LABEL: @shuf_bitcast_chain(
+; CHECK-NEXT:    [[S:%.*]] = shufflevector <8 x i32> [[V:%.*]], <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT:    [[C:%.*]] = bitcast <4 x i32> [[S]] to <16 x i8>
+; CHECK-NEXT:    ret <16 x i8> [[C]]
+;
+  %s = shufflevector <8 x i32> %v, <8 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+  %a = bitcast <4 x i32> %s to <2 x i64>
+  %b = bitcast <2 x i64> %a to i128
+  %c = bitcast i128 %b to <16 x i8>
+  ret <16 x i8> %c
+}

nikic

The alternative here would be to fold the

  %s.bc = bitcast <8 x i32> %v to <2 x i128>
  %s.extract = extractelement <2 x i128> %s.bc, i64 0
  %c = bitcast i128 %s.extract to <16 x i8>

pattern to shufflevector + bitcast instead. That would seem like the more robust solution. Do you think that would be viable, or does that run up against some other problem (I guess introducing shufflevectors without cost model may be problematic?)

rj-jesus · 2025-04-17T12:42:09Z

The alternative here would be to fold the
  %s.bc = bitcast <8 x i32> %v to <2 x i128>
  %s.extract = extractelement <2 x i128> %s.bc, i64 0
  %c = bitcast i128 %s.extract to <16 x i8>
pattern to shufflevector + bitcast instead. That would seem like the more robust solution. Do you think that would be viable, or does that run up against some other problem (I guess introducing shufflevectors without cost model may be problematic?)

Going that route seemed more complex, and I wasn't sure if it could have unintended consequences elsewhere (as you say, because of the shufflevectors). Would you like me to give it a try, though?

rj-jesus · 2025-04-22T08:56:55Z

Hi @nikic, shall I open a PR with the alternative fold (bitcast+extractelement+bitcast to shufflevector+bitcast) so that we can compare both approaches, or are you happy for us to block the initial fold (shufflevector+bitcast to bitcast+extractelement) from happening in the first place?

nikic · 2025-04-22T09:00:52Z

@rj-jesus I think it would be good to at least try the alternative and see whether it runs into any problems.

nikic

LGTM

llvm-ci · 2025-04-30T07:28:04Z

LLVM Buildbot has detected a new failure on builder ml-opt-dev-x86-64 running on ml-opt-dev-x86-64-b2 while building llvm at step 4 "cmake-configure".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/137/builds/17726

Here is the relevant piece of the build log for the reference

Step 4 (cmake-configure) failure: cmake (failure)
...
-- Targeting XCore
-- Registering ExampleIRTransforms as a pass plugin (static build: OFF)
-- Registering Bye as a pass plugin (static build: OFF)
-- Failed to find LLVM FileCheck
-- Google Benchmark version: v0.0.0, normalized to 0.0.0
-- Looking for shm_open in rt
-- Looking for shm_open in rt - found
-- Performing Test HAVE_CXX_FLAG_WALL
-- Performing Test HAVE_CXX_FLAG_WALL - Success
-- Performing Test HAVE_CXX_FLAG_WEXTRA
-- Performing Test HAVE_CXX_FLAG_WEXTRA - Success
-- Performing Test HAVE_CXX_FLAG_WSHADOW
-- Performing Test HAVE_CXX_FLAG_WSHADOW - Success
-- Performing Test HAVE_CXX_FLAG_WFLOAT_EQUAL
-- Performing Test HAVE_CXX_FLAG_WFLOAT_EQUAL - Success
-- Performing Test HAVE_CXX_FLAG_WOLD_STYLE_CAST
-- Performing Test HAVE_CXX_FLAG_WOLD_STYLE_CAST - Success
-- Performing Test HAVE_CXX_FLAG_WSUGGEST_OVERRIDE
-- Performing Test HAVE_CXX_FLAG_WSUGGEST_OVERRIDE - Success
-- Performing Test HAVE_CXX_FLAG_PEDANTIC
-- Performing Test HAVE_CXX_FLAG_PEDANTIC - Success
-- Performing Test HAVE_CXX_FLAG_PEDANTIC_ERRORS
-- Performing Test HAVE_CXX_FLAG_PEDANTIC_ERRORS - Success
-- Performing Test HAVE_CXX_FLAG_WSHORTEN_64_TO_32
-- Performing Test HAVE_CXX_FLAG_WSHORTEN_64_TO_32 - Failed
-- Performing Test HAVE_CXX_FLAG_FSTRICT_ALIASING
-- Performing Test HAVE_CXX_FLAG_FSTRICT_ALIASING - Success
-- Performing Test HAVE_CXX_FLAG_WNO_DEPRECATED_DECLARATIONS
-- Performing Test HAVE_CXX_FLAG_WNO_DEPRECATED_DECLARATIONS - Success
-- Performing Test HAVE_CXX_FLAG_FNO_EXCEPTIONS
-- Performing Test HAVE_CXX_FLAG_FNO_EXCEPTIONS - Success
-- Performing Test HAVE_CXX_FLAG_WSTRICT_ALIASING
-- Performing Test HAVE_CXX_FLAG_WSTRICT_ALIASING - Success
-- Performing Test HAVE_CXX_FLAG_WD654
-- Performing Test HAVE_CXX_FLAG_WD654 - Failed
-- Performing Test HAVE_CXX_FLAG_WTHREAD_SAFETY
-- Performing Test HAVE_CXX_FLAG_WTHREAD_SAFETY - Failed
-- Performing Test HAVE_CXX_FLAG_COVERAGE
-- Performing Test HAVE_CXX_FLAG_COVERAGE - Success
-- Compiling and running to test HAVE_GNU_POSIX_REGEX
-- Performing Test HAVE_GNU_POSIX_REGEX -- failed to compile
-- Compiling and running to test HAVE_POSIX_REGEX
-- Performing Test HAVE_POSIX_REGEX -- success
-- Compiling and running to test HAVE_STEADY_CLOCK
-- Performing Test HAVE_STEADY_CLOCK -- success
-- Compiling and running to test HAVE_PTHREAD_AFFINITY
-- Performing Test HAVE_PTHREAD_AFFINITY -- failed to compile
-- Configuring incomplete, errors occurred!
See also "/b/ml-opt-dev-x86-64-b1/build/CMakeFiles/CMakeOutput.log".
See also "/b/ml-opt-dev-x86-64-b1/build/CMakeFiles/CMakeError.log".

rj-jesus · 2025-05-01T13:40:58Z

Hi @nikic - Would it be reasonable to backport this to LLVM 20?
This fixes a ~30% regression in the workloads where the regression was detected.

nikic · 2025-05-01T14:16:20Z

@rj-jesus Yes, this should be fine to backport.

rj-jesus · 2025-05-01T14:23:56Z

Perfect, thanks very much!

rj-jesus · 2025-05-01T14:25:45Z

/cherry-pick c91c3f9

llvmbot · 2025-05-01T14:31:46Z

/pull-request #138142

…able. (llvm#135769) If we are attempting to combine shuffle+bitcast but the bitcast is pairable with a subsequent bitcast, we should not fold the shuffle as doing so can block further simplifications. The motivation for this is a long-standing regression affecting SIMDe on AArch64, introduced indirectly by the AlwaysInliner (1a2e77c). Some reproducers: * https://godbolt.org/z/53qx18s6M * https://godbolt.org/z/o5e43h5M7

…able. (llvm#135769) If we are attempting to combine shuffle+bitcast but the bitcast is pairable with a subsequent bitcast, we should not fold the shuffle as doing so can block further simplifications. The motivation for this is a long-standing regression affecting SIMDe on AArch64, introduced indirectly by the AlwaysInliner (1a2e77c). Some reproducers: * https://godbolt.org/z/53qx18s6M * https://godbolt.org/z/o5e43h5M7 (cherry picked from commit c91c3f9)

rj-jesus added 2 commits April 14, 2025 07:58

Precommit test.

23c096c

rj-jesus requested a review from davemgreen April 15, 2025 09:58

rj-jesus requested a review from nikic as a code owner April 15, 2025 09:58

llvmbot added llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Apr 15, 2025

This was referenced Apr 15, 2025

Task submission dtcxzyw/llvm-opt-benchmark#1312

Open

pre-commit: PR135769 dtcxzyw/llvm-opt-benchmark#2273

Closed

nikic reviewed Apr 17, 2025

View reviewed changes

rj-jesus mentioned this pull request Apr 23, 2025

[InstCombine] Fold bitcast (extelt (bitcast X), Idx) into bitcast+shufflevector. #136998

Closed

rj-jesus added 2 commits April 29, 2025 08:03

Add new test (transformation off).

cbdcf1f

Update tests (transformation on).

06ecbae

nikic approved these changes Apr 29, 2025

View reviewed changes

rj-jesus merged commit c91c3f9 into llvm:main Apr 30, 2025
11 checks passed

rj-jesus deleted the rjj/instcombine-skip-shuffle-bitcast-bitcast branch April 30, 2025 07:22

rj-jesus added this to the LLVM 20.X Release milestone May 1, 2025

github-project-automation bot added this to LLVM Release Status May 1, 2025

github-project-automation bot moved this to Needs Triage in LLVM Release Status May 1, 2025

llvmbot moved this from Needs Triage to Done in LLVM Release Status May 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[InstCombine] Do not combine shuffle+bitcast if the bitcast is eliminable. #135769

[InstCombine] Do not combine shuffle+bitcast if the bitcast is eliminable. #135769

Uh oh!

rj-jesus commented Apr 15, 2025

Uh oh!

llvmbot commented Apr 15, 2025

Uh oh!

nikic left a comment

Uh oh!

rj-jesus commented Apr 17, 2025

Uh oh!

rj-jesus commented Apr 22, 2025 •

edited

Loading

Uh oh!

nikic commented Apr 22, 2025

Uh oh!

nikic left a comment

Uh oh!

Uh oh!

llvm-ci commented Apr 30, 2025

Uh oh!

rj-jesus commented May 1, 2025 •

edited

Loading

Uh oh!

nikic commented May 1, 2025

Uh oh!

rj-jesus commented May 1, 2025

Uh oh!

rj-jesus commented May 1, 2025

Uh oh!

llvmbot commented May 1, 2025

Uh oh!

Uh oh!

[InstCombine] Do not combine shuffle+bitcast if the bitcast is eliminable. #135769

[InstCombine] Do not combine shuffle+bitcast if the bitcast is eliminable. #135769

Uh oh!

Conversation

rj-jesus commented Apr 15, 2025

Uh oh!

llvmbot commented Apr 15, 2025

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

rj-jesus commented Apr 17, 2025

Uh oh!

rj-jesus commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic commented Apr 22, 2025

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Apr 30, 2025

Uh oh!

rj-jesus commented May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic commented May 1, 2025

Uh oh!

rj-jesus commented May 1, 2025

Uh oh!

rj-jesus commented May 1, 2025

Uh oh!

llvmbot commented May 1, 2025

Uh oh!

Uh oh!

rj-jesus commented Apr 22, 2025 •

edited

Loading

rj-jesus commented May 1, 2025 •

edited

Loading