Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[AArch64] Utilize XAR for certain vector rotates #137629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Rajveer100
Copy link
Contributor

Resolves #137162

For cases when there isn't any XOR in the transformation, replace with a zero register.

@llvmbot
Copy link
Member

llvmbot commented Apr 28, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Rajveer Singh Bharadwaj (Rajveer100)

Changes

Resolves #137162

For cases when there isn't any XOR in the transformation, replace with a zero register.


Full diff: https://github.com/llvm/llvm-project/pull/137629.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp (+24-9)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
index 40944e3d43d6b..b0559692331d8 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
@@ -13,6 +13,7 @@
 #include "AArch64MachineFunctionInfo.h"
 #include "AArch64TargetMachine.h"
 #include "MCTargetDesc/AArch64AddressingModes.h"
+#include "MCTargetDesc/AArch64MCTargetDesc.h"
 #include "llvm/ADT/APSInt.h"
 #include "llvm/CodeGen/ISDOpcodes.h"
 #include "llvm/CodeGen/SelectionDAGISel.h"
@@ -4558,9 +4559,15 @@ bool AArch64DAGToDAGISel::trySelectXAR(SDNode *N) {
         !TLI->isAllActivePredicate(*CurDAG, N1.getOperand(0)))
       return false;
 
-    SDValue XOR = N0.getOperand(1);
-    if (XOR.getOpcode() != ISD::XOR || XOR != N1.getOperand(1))
-      return false;
+    SDValue R1, R2;
+    if (N0.getOperand(1).getOpcode() != ISD::XOR) {
+      if (N0.getOperand(1) != N1.getOperand(1))
+        return false;
+      SDLoc DL(N1->getOperand(0));
+      SDValue Zero = CurDAG->getRegister(AArch64::XZR, N1->getOperand(0).getValueType());
+      R1 = N1->getOperand(0);
+      R2 = Zero;
+    }
 
     APInt ShlAmt, ShrAmt;
     if (!ISD::isConstantSplatVector(N0.getOperand(2).getNode(), ShlAmt) ||
@@ -4574,7 +4581,7 @@ bool AArch64DAGToDAGISel::trySelectXAR(SDNode *N) {
     SDValue Imm =
         CurDAG->getTargetConstant(ShrAmt.getZExtValue(), DL, MVT::i32);
 
-    SDValue Ops[] = {XOR.getOperand(0), XOR.getOperand(1), Imm};
+    SDValue Ops[] = {R1, R2, Imm};
     if (auto Opc = SelectOpcodeFromVT<SelectTypeKind::Int>(
             VT, {AArch64::XAR_ZZZI_B, AArch64::XAR_ZZZI_H, AArch64::XAR_ZZZI_S,
                  AArch64::XAR_ZZZI_D})) {
@@ -4591,13 +4598,21 @@ bool AArch64DAGToDAGISel::trySelectXAR(SDNode *N) {
       N1->getOpcode() != AArch64ISD::VLSHR)
     return false;
 
-  if (N0->getOperand(0) != N1->getOperand(0) ||
-      N1->getOperand(0)->getOpcode() != ISD::XOR)
+
+  if (N0->getOperand(0) != N1->getOperand(0))
     return false;
 
-  SDValue XOR = N0.getOperand(0);
-  SDValue R1 = XOR.getOperand(0);
-  SDValue R2 = XOR.getOperand(1);
+  SDValue R1, R2;
+  if (N1->getOperand(0)->getOpcode() != ISD::XOR) {
+    SDLoc DL(N1->getOperand(0));
+    SDValue Zero = CurDAG->getRegister(AArch64::XZR, N1->getOperand(0).getValueType());
+    R1 = N1->getOperand(0);
+    R2 = Zero;
+  } else {
+    SDValue XOR = N0.getOperand(0);
+    R1 = XOR.getOperand(0);
+    R2 = XOR.getOperand(1);
+  }
 
   unsigned HsAmt = N0.getConstantOperandVal(1);
   unsigned ShAmt = N1.getConstantOperandVal(1);

@Rajveer100
Copy link
Contributor Author

Rajveer100 commented Apr 28, 2025

I currently face this assertion locally (Apple Silicon) when reproducing the original snippet after the change:

Assertion failed: (*(AsmStrsvreg+RegAsmOffsetvreg[RegNo-1]) && "Invalid alt name index for register!"), function getRegisterName, file AArch64GenAsmWriter.inc, line 24199.

Copy link

github-actions bot commented Apr 28, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@Rajveer100 Rajveer100 force-pushed the xar-vector-rotate branch 4 times, most recently from ae842ad to c8e32e0 Compare April 30, 2025 13:14
@davemgreen
Copy link
Collaborator

This could to with some extra tests for fixed and scalable vectors, and to make sure the existing tests work. You might need to be careful about how the zero gets generated, I think it might need to generate a MOVIv2d_ns 0, but there are several ways to specify a zero vector and many of them are equivalent.

@Rajveer100
Copy link
Contributor Author

Rajveer100 commented May 1, 2025

For scalable vectors, which instruction do we want to use among LDxxx, MOVAZxxx, MOVPRFXxxx, DUPv2i64lane and many others since MOVIv2d_ns (and MOVIxxx) is for fixed size vectors?

@Rajveer100 Rajveer100 force-pushed the xar-vector-rotate branch from c8e32e0 to 57329e9 Compare May 1, 2025 11:59
@davemgreen
Copy link
Collaborator

MOVIv2d_ns will actually set all the upper bits too, so it can use MOVIv2d_ns and an SUBREG_TO_REG. Something like SUBREG_TO_REG 0, Vn, zsub will tell the compiler that the fp128 is extended to a zreg and that the top bits are zeroes.

@Rajveer100 Rajveer100 force-pushed the xar-vector-rotate branch from 57329e9 to c14ca26 Compare May 3, 2025 12:58
@Rajveer100
Copy link
Contributor Author

MOVIv2d_ns will actually set all the upper bits too, so it can use MOVIv2d_ns and an SUBREG_TO_REG. Something like SUBREG_TO_REG 0, Vn, zsub will tell the compiler that the fp128 is extended to a zreg and that the top bits are zeroes.

I am probably not doing it the right way, pushed changes for review.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking OK. Can you add some tests for fixed-width and scalable types? Some of the fixed-width sizes do no have an instruction they can use, unless they start to use SVE instructions.

@Rajveer100 Rajveer100 force-pushed the xar-vector-rotate branch from c14ca26 to 9c2648d Compare May 6, 2025 08:51
@Rajveer100
Copy link
Contributor Author

Rajveer100 commented May 6, 2025

Updated the failing tests, will add new tests.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the failing tests, will add new tests.

update_llc_test_checks.py can update the check lines, so long as the output is correct.

@Rajveer100 Rajveer100 force-pushed the xar-vector-rotate branch 2 times, most recently from eb4a062 to 6861dce Compare May 7, 2025 10:16
Resolves llvm#137162

For cases when there isn't any `XOR` in the transformation,
replace with a zero register.
@Rajveer100 Rajveer100 force-pushed the xar-vector-rotate branch from 6861dce to 082ab05 Compare May 7, 2025 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AArch64] can use XAR for vector rotates where possible
3 participants