Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[AArch64][SVE] Combine UXT[BHW] intrinsics to AND. #137956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 6, 2025

Conversation

rj-jesus
Copy link
Contributor

Currently, we lower UXT[BHW] intrinsics into the corresponding UXT*
instruction. However, when the governing predicate is all-true or the
passthrough is undef (e.g. in cases of ``unknown'' merging), we can
lower them into AND immediate instructions instead.

For example, given:

svuint64_t foo(svuint64_t x) {
  return svextb_z(svptrue_b64(), x);
}

Currently:

foo:
  ptrue   p0.d
  movi    v1.2d, #0000000000000000
  uxtb    z0.d, p0/m, z0.d
  ret

Becomes:

foo:
  and     z0.d, z0.d, #0xff
  ret

I implemented this in InstCombine in case it unblocks other
simplifications, but I'm happy to move it elsewhere if needed.

Currently, we lower UXT[BHW] intrinsics into the corresponding UXT*
instruction. However, when the governing predicate is all-true or the
passthrough is undef (e.g. in the case of ``don't care'' merging), we
can lower them into AND immediate instructions instead.

For example:
```cpp

svuint64_t foo_z(svuint64_t x) {
  return svextb_z(svptrue_b64(), x);
}
```

Currently:
```
foo_z:
  ptrue   p0.d
  movi    v1.2d, #0000000000000000
  uxtb    z0.d, p0/m, z0.d
  ret
```

Becomes:
```
foo_z:
  and     z0.d, z0.d, #0xff
  ret
```

We do this early in InstCombine in case it unblocks other
simplifications.
@llvmbot
Copy link
Member

llvmbot commented Apr 30, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-aarch64

Author: Ricardo Jesus (rj-jesus)

Changes

Currently, we lower UXT[BHW] intrinsics into the corresponding UXT*
instruction. However, when the governing predicate is all-true or the
passthrough is undef (e.g. in cases of ``unknown'' merging), we can
lower them into AND immediate instructions instead.

For example, given:

svuint64_t foo(svuint64_t x) {
  return svextb_z(svptrue_b64(), x);
}

Currently:

foo:
  ptrue   p0.d
  movi    v1.2d, #<!-- -->0000000000000000
  uxtb    z0.d, p0/m, z0.d
  ret

Becomes:

foo:
  and     z0.d, z0.d, #<!-- -->0xff
  ret

I implemented this in InstCombine in case it unblocks other
simplifications, but I'm happy to move it elsewhere if needed.


Patch is 23.35 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/137956.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+26)
  • (added) llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll (+336)
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 594f1bff5c458..e9050d184f0f7 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -2640,6 +2640,26 @@ static std::optional<Instruction *> instCombinePTrue(InstCombiner &IC,
   return std::nullopt;
 }
 
+static std::optional<Instruction *> instCombineSVEUxt(InstCombiner &IC,
+                                                      IntrinsicInst &II,
+                                                      unsigned NumBits) {
+  Value *Passthru = II.getOperand(0);
+  Value *Pg = II.getOperand(1);
+  Value *Op = II.getOperand(2);
+
+  // Convert UXT[BHW] to AND.
+  if (isa<UndefValue>(Passthru) || isAllActivePredicate(Pg)) {
+    auto *Ty = cast<VectorType>(II.getType());
+    auto MaskValue = APInt::getLowBitsSet(Ty->getScalarSizeInBits(), NumBits);
+    auto *Mask = ConstantVector::getSplat(
+        Ty->getElementCount(),
+        ConstantInt::get(Ty->getElementType(), MaskValue));
+    return IC.replaceInstUsesWith(II, IC.Builder.CreateAnd(Op, Mask));
+  }
+
+  return std::nullopt;
+}
+
 std::optional<Instruction *>
 AArch64TTIImpl::instCombineIntrinsic(InstCombiner &IC,
                                      IntrinsicInst &II) const {
@@ -2745,6 +2765,12 @@ AArch64TTIImpl::instCombineIntrinsic(InstCombiner &IC,
     return instCombineSVEInsr(IC, II);
   case Intrinsic::aarch64_sve_ptrue:
     return instCombinePTrue(IC, II);
+  case Intrinsic::aarch64_sve_uxtb:
+    return instCombineSVEUxt(IC, II, 8);
+  case Intrinsic::aarch64_sve_uxth:
+    return instCombineSVEUxt(IC, II, 16);
+  case Intrinsic::aarch64_sve_uxtw:
+    return instCombineSVEUxt(IC, II, 32);
   }
 
   return std::nullopt;
diff --git a/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll b/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll
new file mode 100644
index 0000000000000..86986b510aa27
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll
@@ -0,0 +1,336 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -passes=instcombine < %s | FileCheck %s
+
+target triple = "aarch64-unknown-linux-gnu"
+
+define <vscale x 2 x i64> @uxtb_z_64(<vscale x 2 x i64> %0) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxtb_z_64(
+; CHECK-SAME: <vscale x 2 x i64> [[TMP0:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:    [[TMP2:%.*]] = and <vscale x 2 x i64> [[TMP0]], splat (i64 255)
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP2]]
+;
+  %2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> splat (i1 true), <vscale x 2 x i64> %0)
+  ret <vscale x 2 x i64> %2
+}
+
+define <vscale x 2 x i64> @uxtb_m_64(<vscale x 2 x i64> %0, <vscale x 2 x i64> %1) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxtb_m_64(
+; CHECK-SAME: <vscale x 2 x i64> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP3:%.*]] = and <vscale x 2 x i64> [[TMP0]], splat (i64 255)
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP3]]
+;
+  %3 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> %1, <vscale x 2 x i1> splat (i1 true), <vscale x 2 x i64> %0)
+  ret <vscale x 2 x i64> %3
+}
+
+define <vscale x 2 x i64> @uxtb_x_64(<vscale x 16 x i1> %0, <vscale x 2 x i64> %1) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxtb_x_64(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP4:%.*]] = and <vscale x 2 x i64> [[TMP1]], splat (i64 255)
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP4]]
+;
+  %3 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0)
+  %4 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> undef, <vscale x 2 x i1> %3, <vscale x 2 x i64> %1)
+  ret <vscale x 2 x i64> %4
+}
+
+define <vscale x 2 x i64> @uxtb_z_64_no_ptrue(<vscale x 16 x i1> %0, <vscale x 2 x i64> %1) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxtb_z_64_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP3:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT:    [[TMP4:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> [[TMP3]], <vscale x 2 x i64> [[TMP1]])
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP4]]
+;
+  %3 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0)
+  %4 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> %3, <vscale x 2 x i64> %1)
+  ret <vscale x 2 x i64> %4
+}
+
+define <vscale x 2 x i64> @uxtb_m_64_no_ptrue(<vscale x 16 x i1> %0, <vscale x 2 x i64> %1, <vscale x 2 x i64> %2) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxtb_m_64_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]], <vscale x 2 x i64> [[TMP2:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP4:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT:    [[TMP5:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> [[TMP2]], <vscale x 2 x i1> [[TMP4]], <vscale x 2 x i64> [[TMP1]])
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP5]]
+;
+  %4 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0)
+  %5 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> %2, <vscale x 2 x i1> %4, <vscale x 2 x i64> %1)
+  ret <vscale x 2 x i64> %5
+}
+
+define <vscale x 4 x i32> @uxtb_z_32(<vscale x 4 x i32> %0) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxtb_z_32(
+; CHECK-SAME: <vscale x 4 x i32> [[TMP0:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP2:%.*]] = and <vscale x 4 x i32> [[TMP0]], splat (i32 255)
+; CHECK-NEXT:    ret <vscale x 4 x i32> [[TMP2]]
+;
+  %2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxtb.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i1> splat (i1 true), <vscale x 4 x i32> %0)
+  ret <vscale x 4 x i32> %2
+}
+
+define <vscale x 4 x i32> @uxtb_m_32(<vscale x 4 x i32> %0, <vscale x 4 x i32> %1) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxtb_m_32(
+; CHECK-SAME: <vscale x 4 x i32> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP3:%.*]] = and <vscale x 4 x i32> [[TMP0]], splat (i32 255)
+; CHECK-NEXT:    ret <vscale x 4 x i32> [[TMP3]]
+;
+  %3 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxtb.nxv4i32(<vscale x 4 x i32> %1, <vscale x 4 x i1> splat (i1 true), <vscale x 4 x i32> %0)
+  ret <vscale x 4 x i32> %3
+}
+
+define <vscale x 4 x i32> @uxtb_x_32(<vscale x 16 x i1> %0, <vscale x 4 x i32> %1) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxtb_x_32(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP4:%.*]] = and <vscale x 4 x i32> [[TMP1]], splat (i32 255)
+; CHECK-NEXT:    ret <vscale x 4 x i32> [[TMP4]]
+;
+  %3 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %0)
+  %4 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxtb.nxv4i32(<vscale x 4 x i32> undef, <vscale x 4 x i1> %3, <vscale x 4 x i32> %1)
+  ret <vscale x 4 x i32> %4
+}
+
+define <vscale x 4 x i32> @uxtb_z_32_no_ptrue(<vscale x 16 x i1> %0, <vscale x 4 x i32> %1) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxtb_z_32_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP3:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT:    [[TMP4:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxtb.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i1> [[TMP3]], <vscale x 4 x i32> [[TMP1]])
+; CHECK-NEXT:    ret <vscale x 4 x i32> [[TMP4]]
+;
+  %3 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %0)
+  %4 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxtb.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i1> %3, <vscale x 4 x i32> %1)
+  ret <vscale x 4 x i32> %4
+}
+
+define <vscale x 4 x i32> @uxtb_m_32_no_ptrue(<vscale x 16 x i1> %0, <vscale x 4 x i32> %1, <vscale x 4 x i32> %2) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxtb_m_32_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]], <vscale x 4 x i32> [[TMP2:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP4:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT:    [[TMP5:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxtb.nxv4i32(<vscale x 4 x i32> [[TMP2]], <vscale x 4 x i1> [[TMP4]], <vscale x 4 x i32> [[TMP1]])
+; CHECK-NEXT:    ret <vscale x 4 x i32> [[TMP5]]
+;
+  %4 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %0)
+  %5 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxtb.nxv4i32(<vscale x 4 x i32> %2, <vscale x 4 x i1> %4, <vscale x 4 x i32> %1)
+  ret <vscale x 4 x i32> %5
+}
+
+define <vscale x 8 x i16> @uxtb_z_16(<vscale x 8 x i16> %0) #0 {
+; CHECK-LABEL: define <vscale x 8 x i16> @uxtb_z_16(
+; CHECK-SAME: <vscale x 8 x i16> [[TMP0:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP2:%.*]] = and <vscale x 8 x i16> [[TMP0]], splat (i16 255)
+; CHECK-NEXT:    ret <vscale x 8 x i16> [[TMP2]]
+;
+  %2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uxtb.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i1> splat (i1 true), <vscale x 8 x i16> %0)
+  ret <vscale x 8 x i16> %2
+}
+
+define <vscale x 8 x i16> @uxtb_m_16(<vscale x 8 x i16> %0, <vscale x 8 x i16> %1) #0 {
+; CHECK-LABEL: define <vscale x 8 x i16> @uxtb_m_16(
+; CHECK-SAME: <vscale x 8 x i16> [[TMP0:%.*]], <vscale x 8 x i16> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP3:%.*]] = and <vscale x 8 x i16> [[TMP0]], splat (i16 255)
+; CHECK-NEXT:    ret <vscale x 8 x i16> [[TMP3]]
+;
+  %3 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uxtb.nxv8i16(<vscale x 8 x i16> %1, <vscale x 8 x i1> splat (i1 true), <vscale x 8 x i16> %0)
+  ret <vscale x 8 x i16> %3
+}
+
+define <vscale x 8 x i16> @uxtb_x_16(<vscale x 16 x i1> %0, <vscale x 8 x i16> %1) #0 {
+; CHECK-LABEL: define <vscale x 8 x i16> @uxtb_x_16(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 8 x i16> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP4:%.*]] = and <vscale x 8 x i16> [[TMP1]], splat (i16 255)
+; CHECK-NEXT:    ret <vscale x 8 x i16> [[TMP4]]
+;
+  %3 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %0)
+  %4 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uxtb.nxv8i16(<vscale x 8 x i16> undef, <vscale x 8 x i1> %3, <vscale x 8 x i16> %1)
+  ret <vscale x 8 x i16> %4
+}
+
+define <vscale x 8 x i16> @uxtb_z_16_no_ptrue(<vscale x 16 x i1> %0, <vscale x 8 x i16> %1) #0 {
+; CHECK-LABEL: define <vscale x 8 x i16> @uxtb_z_16_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 8 x i16> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP3:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT:    [[TMP4:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uxtb.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i1> [[TMP3]], <vscale x 8 x i16> [[TMP1]])
+; CHECK-NEXT:    ret <vscale x 8 x i16> [[TMP4]]
+;
+  %3 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %0)
+  %4 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uxtb.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i1> %3, <vscale x 8 x i16> %1)
+  ret <vscale x 8 x i16> %4
+}
+
+define <vscale x 8 x i16> @uxtb_m_16_no_ptrue(<vscale x 16 x i1> %0, <vscale x 8 x i16> %1, <vscale x 8 x i16> %2) #0 {
+; CHECK-LABEL: define <vscale x 8 x i16> @uxtb_m_16_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 8 x i16> [[TMP1:%.*]], <vscale x 8 x i16> [[TMP2:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP4:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT:    [[TMP5:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uxtb.nxv8i16(<vscale x 8 x i16> [[TMP2]], <vscale x 8 x i1> [[TMP4]], <vscale x 8 x i16> [[TMP1]])
+; CHECK-NEXT:    ret <vscale x 8 x i16> [[TMP5]]
+;
+  %4 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %0)
+  %5 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uxtb.nxv8i16(<vscale x 8 x i16> %2, <vscale x 8 x i1> %4, <vscale x 8 x i16> %1)
+  ret <vscale x 8 x i16> %5
+}
+
+define <vscale x 2 x i64> @uxth_z_64(<vscale x 2 x i64> %0) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxth_z_64(
+; CHECK-SAME: <vscale x 2 x i64> [[TMP0:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP2:%.*]] = and <vscale x 2 x i64> [[TMP0]], splat (i64 65535)
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP2]]
+;
+  %2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxth.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> splat (i1 true), <vscale x 2 x i64> %0)
+  ret <vscale x 2 x i64> %2
+}
+
+define <vscale x 2 x i64> @uxth_m_64(<vscale x 2 x i64> %0, <vscale x 2 x i64> %1) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxth_m_64(
+; CHECK-SAME: <vscale x 2 x i64> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP3:%.*]] = and <vscale x 2 x i64> [[TMP0]], splat (i64 65535)
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP3]]
+;
+  %3 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxth.nxv2i64(<vscale x 2 x i64> %1, <vscale x 2 x i1> splat (i1 true), <vscale x 2 x i64> %0)
+  ret <vscale x 2 x i64> %3
+}
+
+define <vscale x 2 x i64> @uxth_x_64(<vscale x 16 x i1> %0, <vscale x 2 x i64> %1) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxth_x_64(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP4:%.*]] = and <vscale x 2 x i64> [[TMP1]], splat (i64 65535)
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP4]]
+;
+  %3 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0)
+  %4 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxth.nxv2i64(<vscale x 2 x i64> undef, <vscale x 2 x i1> %3, <vscale x 2 x i64> %1)
+  ret <vscale x 2 x i64> %4
+}
+
+define <vscale x 2 x i64> @uxth_z_64_no_ptrue(<vscale x 16 x i1> %0, <vscale x 2 x i64> %1) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxth_z_64_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP3:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT:    [[TMP4:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxth.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> [[TMP3]], <vscale x 2 x i64> [[TMP1]])
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP4]]
+;
+  %3 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0)
+  %4 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxth.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> %3, <vscale x 2 x i64> %1)
+  ret <vscale x 2 x i64> %4
+}
+
+define <vscale x 2 x i64> @uxth_m_64_no_ptrue(<vscale x 16 x i1> %0, <vscale x 2 x i64> %1, <vscale x 2 x i64> %2) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxth_m_64_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]], <vscale x 2 x i64> [[TMP2:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP4:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT:    [[TMP5:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxth.nxv2i64(<vscale x 2 x i64> [[TMP2]], <vscale x 2 x i1> [[TMP4]], <vscale x 2 x i64> [[TMP1]])
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP5]]
+;
+  %4 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0)
+  %5 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxth.nxv2i64(<vscale x 2 x i64> %2, <vscale x 2 x i1> %4, <vscale x 2 x i64> %1)
+  ret <vscale x 2 x i64> %5
+}
+
+define <vscale x 4 x i32> @uxth_z_32(<vscale x 4 x i32> %0) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxth_z_32(
+; CHECK-SAME: <vscale x 4 x i32> [[TMP0:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP2:%.*]] = and <vscale x 4 x i32> [[TMP0]], splat (i32 65535)
+; CHECK-NEXT:    ret <vscale x 4 x i32> [[TMP2]]
+;
+  %2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxth.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i1> splat (i1 true), <vscale x 4 x i32> %0)
+  ret <vscale x 4 x i32> %2
+}
+
+define <vscale x 4 x i32> @uxth_m_32(<vscale x 4 x i32> %0, <vscale x 4 x i32> %1) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxth_m_32(
+; CHECK-SAME: <vscale x 4 x i32> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP3:%.*]] = and <vscale x 4 x i32> [[TMP0]], splat (i32 65535)
+; CHECK-NEXT:    ret <vscale x 4 x i32> [[TMP3]]
+;
+  %3 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxth.nxv4i32(<vscale x 4 x i32> %1, <vscale x 4 x i1> splat (i1 true), <vscale x 4 x i32> %0)
+  ret <vscale x 4 x i32> %3
+}
+
+define <vscale x 4 x i32> @uxth_x_32(<vscale x 16 x i1> %0, <vscale x 4 x i32> %1) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxth_x_32(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP4:%.*]] = and <vscale x 4 x i32> [[TMP1]], splat (i32 65535)
+; CHECK-NEXT:    ret <vscale x 4 x i32> [[TMP4]]
+;
+  %3 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %0)
+  %4 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxth.nxv4i32(<vscale x 4 x i32> undef, <vscale x 4 x i1> %3, <vscale x 4 x i32> %1)
+  ret <vscale x 4 x i32> %4
+}
+
+define <vscale x 4 x i32> @uxth_z_32_no_ptrue(<vscale x 16 x i1> %0, <vscale x 4 x i32> %1) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxth_z_32_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP3:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT:    [[TMP4:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxth.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i1> [[TMP3]], <vscale x 4 x i32> [[TMP1]])
+; CHECK-NEXT:    ret <vscale x 4 x i32> [[TMP4]]
+;
+  %3 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %0)
+  %4 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxth.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i1> %3, <vscale x 4 x i32> %1)
+  ret <vscale x 4 x i32> %4
+}
+
+define <vscale x 4 x i32> @uxth_m_32_no_ptrue(<vscale x 16 x i1> %0, <vscale x 4 x i32> %1, <vscale x 4 x i32> %2) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxth_m_32_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]], <vscale x 4 x i32> [[TMP2:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP4:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT:    [[TMP5:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxth.nxv4i32(<vscale x 4 x i32> [[TMP2]], <vscale x 4 x i1> [[TMP4]], <vscale x 4 x i32> [[TMP1]])
+; CHECK-NEXT:    ret <vscale x 4 x i32> [[TMP5]]
+;
+  %4 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %0)
+  %5 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxth.nxv4i32(<vscale x 4 x i32> %2, <vscale x 4 x i1> %4, <vscale x 4 x i32> %1)
+  ret <vscale x 4 x i32> %5
+}
+
+define <vscale x 2 x i64> @uxtw_z_64(<vscale x 2 x i64> %0) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxtw_z_64(
+; CHECK-SAME: <vscale x 2 x i64> [[TMP0:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP2:%.*]] = and <vs...
[truncated]

Copy link

⚠️ undef deprecator found issues in your code. ⚠️

You can test this locally with the following command:
git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 'HEAD~1' HEAD llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

The following files introduce new uses of undef:

  • llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll

Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields undef. You should use poison values for placeholders instead.

In tests, avoid using undef and having tests that trigger undefined behavior. If you need an operand with some unimportant value, you can add a new argument to the function and use that instead.

For example, this is considered a bad practice:

define void @fn() {
  ...
  br i1 undef, ...
}

Please use the following instead:

define void @fn(i1 %cond) {
  ...
  br i1 %cond, ...
}

Please refer to the Undefined Behavior Manual for more information.

Copy link

github-actions bot commented Apr 30, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions cpp -- llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index e1432d178..a1575be18 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -2702,9 +2702,8 @@ static std::optional<Instruction *> instCombinePTrue(InstCombiner &IC,
   return std::nullopt;
 }
 
-static std::optional<Instruction *> instCombineSVEUxt(InstCombiner &IC,
-                                                      IntrinsicInst &II,
-                                                      unsigned NumBits) {
+static std::optional<Instruction *>
+instCombineSVEUxt(InstCombiner &IC, IntrinsicInst &II, unsigned NumBits) {
   Value *Passthru = II.getOperand(0);
   Value *Pg = II.getOperand(1);
   Value *Op = II.getOperand(2);

@rj-jesus
Copy link
Contributor Author

About the bots:

  • llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll introduces uses of undef because that's how ``don't care'' merging is lowered, e.g.: https://godbolt.org/z/6s3vY5f5x.
  • The formatting difference seems to be in line with other similar declarations.

Please let me know if I should change either.

@paulwalker-arm
Copy link
Collaborator

  • llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll introduces uses of undef because that's how ``don't care'' merging is lowered, e.g.: https://godbolt.org/z/6s3vY5f5x.

FYI: I have been investigating whether we can replace these uses of undef with poison but it's not entirely straightforward. I have a plan but there are several pieces that need completing before I can change clang because otherwise we're likely to see regression in generated code.

@rj-jesus
Copy link
Contributor Author

FYI: I have been investigating whether we can replace these uses of undef with poison but it's not entirely straightforward. I have a plan but there are several pieces that need completing before I can change clang because otherwise we're likely to see regression in generated code.

Thanks for the heads-up! I can imagine it won't be easy to make sure everything still plays nicely.
As things stand, I don't think there's another way of testing ACLE's ``don't care'' merging other than with undef values as in this patch, right?

@paulwalker-arm
Copy link
Collaborator

FYI: I have been investigating whether we can replace these uses of undef with poison but it's not entirely straightforward. I have a plan but there are several pieces that need completing before I can change clang because otherwise we're likely to see regression in generated code.

Thanks for the heads-up! I can imagine it won't be easy to make sure everything still plays nicely. As things stand, I don't think there's another way of testing ACLE's ``don't care'' merging other than with undef values as in this patch, right?

Correct. This is one of the rare cases where we have to ignore the formatter errors.

@paulwalker-arm
Copy link
Collaborator

This could be unfounded paranoia on my side but I've generally tried to avoid loosing information about which lanes are active because I figure it may be useful later on. This prompts two thoughts, one easy the other more involved:

  1. What about emitting an sve.and_u intrinsic so we maintain the predicate knowledge. When this gets to ISel it'll still emit the desired AND immediate instruction.
  2. Should we canonicalise all sve.uxt intrinsics to sve.and? You'll still get the output you want but we will need ISEL patterns to emit UXT[B,H,S,B] when desirable (i.e. for sve.and(non-undef, no-all-active-predicate, mask).

@rj-jesus
Copy link
Contributor Author

Correct. This is one of the rare cases where we have to ignore the formatter errors.

That makes sense, thanks! :)

This could be unfounded paranoia on my side but I've generally tried to avoid loosing information about which lanes are active because I figure it may be useful later on. This prompts two thoughts, one easy the other more involved:

  1. What about emitting an sve.and_u intrinsic so we maintain the predicate knowledge. When this gets to ISel it'll still emit the desired AND immediate instruction.

  2. Should we canonicalise all sve.uxt intrinsics to sve.and? You'll still get the output you want but we will need ISEL patterns to emit UXT[B,H,S,B] when desirable (i.e. for sve.and(non-undef, no-all-active-predicate, mask).

I see what you mean, and I'm okay with either option. Do you have a preference?

I guess option (2) sounds more scalable and gives us a chance to improve the lowering for svand_m with all-ones 8/16/32-bit masks (link), but we could equally address it separately.

@paulwalker-arm
Copy link
Collaborator

paulwalker-arm commented Apr 30, 2025

Option (2) is my preference because once complete we can practically ignore sve.uxt from all future transformations.

@rj-jesus
Copy link
Contributor Author

rj-jesus commented Apr 30, 2025

Option (2) is my preference because once complete we can practically ignore sve.uxt from all future transformations.

I agree, although thinking about it a bit more, can we really canonicalise all sve.uxt to sve.and? More specifically, how do we encode the passthrough of the UXT in the AND? The latter doesn't seem to have a real passthrough, in the sense that inactive elements are taken from one of the operands. Do you see what I mean? Or are you suggesting converting the UXT to a sequence of SEL+AND (i.e. something like uxt (pg, inactive, op) -> sel (pg, (and pg, op, mask), inactive))? (Sorry if I'm missing something obvious.)

@paulwalker-arm
Copy link
Collaborator

paulwalker-arm commented Apr 30, 2025

Option (2) is my preference because once complete we can practically ignore sve.uxt from all future transformations.

I agree, although thinking about it a bit more, can we really canonicalise all sve.uxt to sve.and? More specifically, how do we encode the passthrough of the UXT in the AND? The latter doesn't seem to have a real passthrough, in the sense that inactive elements are taken from one of the operands. Do you see what I mean? Or are you suggesting converting the UXT to a sequence of SEL+AND (i.e. something like uxt (pg, inactive, op) -> sel (pg, (and pg, op, mask), inactive))? (Sorry if I'm missing something obvious.)

Darn it! I forgot the unary operations don't use the same operand for data and passthrough :( Sorry for the misdirect. Let's stick with option (1) and pretend this never happened :)

@rj-jesus
Copy link
Contributor Author

Darn it! I forgot the unary operations don't use the same operand for data and passthrough :( Sorry for the misdirect. Let's stick with option (1) and pretend this never happened :)

Aah, that makes sense! Thanks for confirming, I'll update the PR shortly 😄

Comment on lines 2654 to 2656
auto *Mask = ConstantVector::getSplat(
Ty->getElementCount(),
ConstantInt::get(Ty->getElementType(), MaskValue));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto *Mask = ConstantVector::getSplat(
Ty->getElementCount(),
ConstantInt::get(Ty->getElementType(), MaskValue));
auto *Mask = ConstantInt::get(Ty, MaskValue);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ups, thanks - done.

Comment on lines 6 to 14
define <vscale x 2 x i64> @uxtb_z_64(<vscale x 2 x i64> %0) #0 {
; CHECK-LABEL: define <vscale x 2 x i64> @uxtb_z_64(
; CHECK-SAME: <vscale x 2 x i64> [[TMP0:%.*]]) #[[ATTR0:[0-9]+]] {
; CHECK-NEXT: [[TMP2:%.*]] = call <vscale x 2 x i64> @llvm.aarch64.sve.and.u.nxv2i64(<vscale x 2 x i1> splat (i1 true), <vscale x 2 x i64> [[TMP0]], <vscale x 2 x i64> splat (i64 255))
; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
;
%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> splat (i1 true), <vscale x 2 x i64> %0)
ret <vscale x 2 x i64> %2
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do any of the _z_ tests add value? The new combine doesn't care about the passthrough value, only whether it is undef or not.

The _z_ tests aside, it's up to you but I feel many of the other tests can also be removed. I think you only need to test a single element type for the core functionality (i.e. positive and negative tests relating to whether the combine is applied) and then a single test per element type to show the correct and_u equivalent is created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense, I've cleaned up the tests a bit.
(I had auto-generated them from ACLE code without giving it proper thought.)

; CHECK-NEXT: [[TMP4:%.*]] = call <vscale x 2 x i64> @llvm.aarch64.sve.and.u.nxv2i64(<vscale x 2 x i1> [[TMP3]], <vscale x 2 x i64> [[TMP1]], <vscale x 2 x i64> splat (i64 255))
; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP4]]
;
%3 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can pass <vscale x 2 x i1> directly as a parameter and then you'll not need the convert call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, sorted.

; CHECK-NEXT: [[TMP5:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> [[TMP2]], <vscale x 2 x i1> [[TMP4]], <vscale x 2 x i64> [[TMP1]])
; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP5]]
;
%4 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

@rj-jesus rj-jesus merged commit fbd9a31 into llvm:main May 6, 2025
10 of 11 checks passed
@rj-jesus rj-jesus deleted the rjj/sve-fold-ptrue-uxt-to-and branch May 6, 2025 07:48
GeorgeARM pushed a commit to GeorgeARM/llvm-project that referenced this pull request May 7, 2025
This patch combines uxt[bhw] intrinsics to and_u when the governing
predicate is all-true or the passthrough is undef (e.g. in cases of
``unknown'' merging). This improves code gen as the latter can be
emitted as AND immediate instructions.

For example, given:
```cpp
svuint64_t foo(svuint64_t x) {
  return svextb_z(svptrue_b64(), x);
}
```

Currently:
```gas
foo:
  ptrue   p0.d
  movi    v1.2d, #0000000000000000
  uxtb    z0.d, p0/m, z0.d
  ret
```

Becomes:
```gas
foo:
  and     z0.d, z0.d, #0xff
  ret
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants