Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[AArch64] Utilize XAR for certain vector rotates #137629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 9, 2025

Conversation

Rajveer100
Copy link
Contributor

Resolves #137162

For cases when there isn't any XOR in the transformation, replace with a zero register.

@llvmbot
Copy link
Member

llvmbot commented Apr 28, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Rajveer Singh Bharadwaj (Rajveer100)

Changes

Resolves #137162

For cases when there isn't any XOR in the transformation, replace with a zero register.


Full diff: https://github.com/llvm/llvm-project/pull/137629.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (+24-9)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
index 40944e3d43d6b..b0559692331d8 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
@@ -13,6 +13,7 @@
 #include "AArch64MachineFunctionInfo.h"
 #include "AArch64TargetMachine.h"
 #include "MCTargetDesc/AArch64AddressingModes.h"
+#include "MCTargetDesc/AArch64MCTargetDesc.h"
 #include "llvm/ADT/APSInt.h"
 #include "llvm/CodeGen/ISDOpcodes.h"
 #include "llvm/CodeGen/SelectionDAGISel.h"
@@ -4558,9 +4559,15 @@ bool AArch64DAGToDAGISel::trySelectXAR(SDNode *N) {
         !TLI->isAllActivePredicate(*CurDAG, N1.getOperand(0)))
       return false;
 
-    SDValue XOR = N0.getOperand(1);
-    if (XOR.getOpcode() != ISD::XOR || XOR != N1.getOperand(1))
-      return false;
+    SDValue R1, R2;
+    if (N0.getOperand(1).getOpcode() != ISD::XOR) {
+      if (N0.getOperand(1) != N1.getOperand(1))
+        return false;
+      SDLoc DL(N1->getOperand(0));
+      SDValue Zero = CurDAG->getRegister(AArch64::XZR, N1->getOperand(0).getValueType());
+      R1 = N1->getOperand(0);
+      R2 = Zero;
+    }
 
     APInt ShlAmt, ShrAmt;
     if (!ISD::isConstantSplatVector(N0.getOperand(2).getNode(), ShlAmt) ||
@@ -4574,7 +4581,7 @@ bool AArch64DAGToDAGISel::trySelectXAR(SDNode *N) {
     SDValue Imm =
         CurDAG->getTargetConstant(ShrAmt.getZExtValue(), DL, MVT::i32);
 
-    SDValue Ops[] = {XOR.getOperand(0), XOR.getOperand(1), Imm};
+    SDValue Ops[] = {R1, R2, Imm};
     if (auto Opc = SelectOpcodeFromVT<SelectTypeKind::Int>(
             VT, {AArch64::XAR_ZZZI_B, AArch64::XAR_ZZZI_H, AArch64::XAR_ZZZI_S,
                  AArch64::XAR_ZZZI_D})) {
@@ -4591,13 +4598,21 @@ bool AArch64DAGToDAGISel::trySelectXAR(SDNode *N) {
       N1->getOpcode() != AArch64ISD::VLSHR)
     return false;
 
-  if (N0->getOperand(0) != N1->getOperand(0) ||
-      N1->getOperand(0)->getOpcode() != ISD::XOR)
+
+  if (N0->getOperand(0) != N1->getOperand(0))
     return false;
 
-  SDValue XOR = N0.getOperand(0);
-  SDValue R1 = XOR.getOperand(0);
-  SDValue R2 = XOR.getOperand(1);
+  SDValue R1, R2;
+  if (N1->getOperand(0)->getOpcode() != ISD::XOR) {
+    SDLoc DL(N1->getOperand(0));
+    SDValue Zero = CurDAG->getRegister(AArch64::XZR, N1->getOperand(0).getValueType());
+    R1 = N1->getOperand(0);
+    R2 = Zero;
+  } else {
+    SDValue XOR = N0.getOperand(0);
+    R1 = XOR.getOperand(0);
+    R2 = XOR.getOperand(1);
+  }
 
   unsigned HsAmt = N0.getConstantOperandVal(1);
   unsigned ShAmt = N1.getConstantOperandVal(1);

@Rajveer100
Copy link
Contributor Author

Rajveer100 commented Apr 28, 2025

I currently face this assertion locally (Apple Silicon) when reproducing the original snippet after the change:

Assertion failed: (*(AsmStrsvreg+RegAsmOffsetvreg[RegNo-1]) && "Invalid alt name index for register!"), function getRegisterName, file AArch64GenAsmWriter.inc, line 24199.

Copy link

github-actions bot commented Apr 28, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@Rajveer100 Rajveer100 force-pushed the xar-vector-rotate branch 4 times, most recently from ae842ad to c8e32e0 Compare April 30, 2025 13:14
@davemgreen
Copy link
Collaborator

This could to with some extra tests for fixed and scalable vectors, and to make sure the existing tests work. You might need to be careful about how the zero gets generated, I think it might need to generate a MOVIv2d_ns 0, but there are several ways to specify a zero vector and many of them are equivalent.

@Rajveer100
Copy link
Contributor Author

Rajveer100 commented May 1, 2025

For scalable vectors, which instruction do we want to use among LDxxx, MOVAZxxx, MOVPRFXxxx, DUPv2i64lane and many others since MOVIv2d_ns (and MOVIxxx) is for fixed size vectors?

@Rajveer100 Rajveer100 force-pushed the xar-vector-rotate branch from c8e32e0 to 57329e9 Compare May 1, 2025 11:59
@davemgreen
Copy link
Collaborator

MOVIv2d_ns will actually set all the upper bits too, so it can use MOVIv2d_ns and an SUBREG_TO_REG. Something like SUBREG_TO_REG 0, Vn, zsub will tell the compiler that the fp128 is extended to a zreg and that the top bits are zeroes.

@Rajveer100 Rajveer100 force-pushed the xar-vector-rotate branch from 57329e9 to c14ca26 Compare May 3, 2025 12:58
@Rajveer100
Copy link
Contributor Author

MOVIv2d_ns will actually set all the upper bits too, so it can use MOVIv2d_ns and an SUBREG_TO_REG. Something like SUBREG_TO_REG 0, Vn, zsub will tell the compiler that the fp128 is extended to a zreg and that the top bits are zeroes.

I am probably not doing it the right way, pushed changes for review.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking OK. Can you add some tests for fixed-width and scalable types? Some of the fixed-width sizes do no have an instruction they can use, unless they start to use SVE instructions.

@Rajveer100 Rajveer100 force-pushed the xar-vector-rotate branch from c14ca26 to 9c2648d Compare May 6, 2025 08:51
@Rajveer100
Copy link
Contributor Author

Rajveer100 commented May 6, 2025

Updated the failing tests, will add new tests.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the failing tests, will add new tests.

update_llc_test_checks.py can update the check lines, so long as the output is correct.

@Rajveer100 Rajveer100 force-pushed the xar-vector-rotate branch 5 times, most recently from 27b1cd0 to 97bd765 Compare May 8, 2025 10:28
Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - this looks good. Can you add tests like xar_instead_of_or but with types <4 x i32>, <8 x i16> and <16 x i8> to llvm/test/CodeGen/AArch64/xar.ll, and tests like xar_nxv2i64_l_neg2 but with types <vscale x 4 x i32>, <vscale x 8 x i16> and <vscale x 16 x i8> to llvm/test/CodeGen/AArch64/sve2-xar.ll. Not all of them are expected to transform, but we should make sure we test all the combos.

Resolves llvm#137162

For cases when there isn't any `XOR` in the transformation,
replace with a zero register.
@Rajveer100 Rajveer100 force-pushed the xar-vector-rotate branch from 97bd765 to a37b2b7 Compare May 8, 2025 11:09
@Rajveer100
Copy link
Contributor Author

I have added the additional tests.

@Rajveer100
Copy link
Contributor Author

Test failure (lldb) seems unrelated to the changes.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM

@Rajveer100
Copy link
Contributor Author

Could you land this for me, don't have commit access?!

@davemgreen
Copy link
Collaborator

Are you happy for this to be submitted, with the icloud.com email address?

@Rajveer100
Copy link
Contributor Author

All my previous PRs have been submitted with this, so cool :)

@davemgreen davemgreen merged commit 36bb17a into llvm:main May 9, 2025
9 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AArch64] can use XAR for vector rotates where possible
3 participants