[AArch64][SVE] Combine UXT[BHW] intrinsics to AND. #137956

rj-jesus · 2025-04-30T12:28:41Z

Currently, we lower UXT[BHW] intrinsics into the corresponding UXT*
instruction. However, when the governing predicate is all-true or the
passthrough is undef (e.g. in cases of ``unknown'' merging), we can
lower them into AND immediate instructions instead.

For example, given:

svuint64_t foo(svuint64_t x) {
  return svextb_z(svptrue_b64(), x);
}

Currently:

foo:
  ptrue   p0.d
  movi    v1.2d, #0000000000000000
  uxtb    z0.d, p0/m, z0.d
  ret

Becomes:

foo:
  and     z0.d, z0.d, #0xff
  ret

I implemented this in InstCombine in case it unblocks other
simplifications, but I'm happy to move it elsewhere if needed.

Currently, we lower UXT[BHW] intrinsics into the corresponding UXT* instruction. However, when the governing predicate is all-true or the passthrough is undef (e.g. in the case of ``don't care'' merging), we can lower them into AND immediate instructions instead. For example: ```cpp svuint64_t foo_z(svuint64_t x) { return svextb_z(svptrue_b64(), x); } ``` Currently: ``` foo_z: ptrue p0.d movi v1.2d, #0000000000000000 uxtb z0.d, p0/m, z0.d ret ``` Becomes: ``` foo_z: and z0.d, z0.d, #0xff ret ``` We do this early in InstCombine in case it unblocks other simplifications.

llvmbot · 2025-04-30T12:29:15Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-aarch64

Author: Ricardo Jesus (rj-jesus)

Changes

Currently, we lower UXT[BHW] intrinsics into the corresponding UXT*
instruction. However, when the governing predicate is all-true or the
passthrough is undef (e.g. in cases of ``unknown'' merging), we can
lower them into AND immediate instructions instead.

For example, given:

svuint64_t foo(svuint64_t x) {
  return svextb_z(svptrue_b64(), x);
}

Currently:

foo:
  ptrue   p0.d
  movi    v1.2d, #<!-- -->0000000000000000
  uxtb    z0.d, p0/m, z0.d
  ret

Becomes:

foo:
  and     z0.d, z0.d, #<!-- -->0xff
  ret

I implemented this in InstCombine in case it unblocks other
simplifications, but I'm happy to move it elsewhere if needed.

Patch is 23.35 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/137956.diff

2 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+26)
(added) llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll (+336)

diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 594f1bff5c458..e9050d184f0f7 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -2640,6 +2640,26 @@ static std::optional<Instruction *> instCombinePTrue(InstCombiner &IC,
   return std::nullopt;
 }
 
+static std::optional<Instruction *> instCombineSVEUxt(InstCombiner &IC,
+                                                      IntrinsicInst &II,
+                                                      unsigned NumBits) {
+  Value *Passthru = II.getOperand(0);
+  Value *Pg = II.getOperand(1);
+  Value *Op = II.getOperand(2);
+
+  // Convert UXT[BHW] to AND.
+  if (isa<UndefValue>(Passthru) || isAllActivePredicate(Pg)) {
+    auto *Ty = cast<VectorType>(II.getType());
+    auto MaskValue = APInt::getLowBitsSet(Ty->getScalarSizeInBits(), NumBits);
+    auto *Mask = ConstantVector::getSplat(
+        Ty->getElementCount(),
+        ConstantInt::get(Ty->getElementType(), MaskValue));
+    return IC.replaceInstUsesWith(II, IC.Builder.CreateAnd(Op, Mask));
+  }
+
+  return std::nullopt;
+}
+
 std::optional<Instruction *>
 AArch64TTIImpl::instCombineIntrinsic(InstCombiner &IC,
                                      IntrinsicInst &II) const {
@@ -2745,6 +2765,12 @@ AArch64TTIImpl::instCombineIntrinsic(InstCombiner &IC,
     return instCombineSVEInsr(IC, II);
   case Intrinsic::aarch64_sve_ptrue:
     return instCombinePTrue(IC, II);
+  case Intrinsic::aarch64_sve_uxtb:
+    return instCombineSVEUxt(IC, II, 8);
+  case Intrinsic::aarch64_sve_uxth:
+    return instCombineSVEUxt(IC, II, 16);
+  case Intrinsic::aarch64_sve_uxtw:
+    return instCombineSVEUxt(IC, II, 32);
   }
 
   return std::nullopt;
diff --git a/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll b/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll
new file mode 100644
index 0000000000000..86986b510aa27
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll
@@ -0,0 +1,336 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -passes=instcombine < %s | FileCheck %s
+
+target triple = "aarch64-unknown-linux-gnu"
+
+define <vscale x 2 x i64> @uxtb_z_64(<vscale x 2 x i64> %0) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxtb_z_64(
+; CHECK-SAME: <vscale x 2 x i64> [[TMP0:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:    [[TMP2:%.*]] = and <vscale x 2 x i64> [[TMP0]], splat (i64 255)
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP2]]
+;
+  %2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> splat (i1 true), <vscale x 2 x i64> %0)
+  ret <vscale x 2 x i64> %2
+}
+
+define <vscale x 2 x i64> @uxtb_m_64(<vscale x 2 x i64> %0, <vscale x 2 x i64> %1) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxtb_m_64(
+; CHECK-SAME: <vscale x 2 x i64> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP3:%.*]] = and <vscale x 2 x i64> [[TMP0]], splat (i64 255)
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP3]]
+;
+  %3 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> %1, <vscale x 2 x i1> splat (i1 true), <vscale x 2 x i64> %0)
+  ret <vscale x 2 x i64> %3
+}
+
+define <vscale x 2 x i64> @uxtb_x_64(<vscale x 16 x i1> %0, <vscale x 2 x i64> %1) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxtb_x_64(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP4:%.*]] = and <vscale x 2 x i64> [[TMP1]], splat (i64 255)
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP4]]
+;
+  %3 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0)
+  %4 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> undef, <vscale x 2 x i1> %3, <vscale x 2 x i64> %1)
+  ret <vscale x 2 x i64> %4
+}
+
+define <vscale x 2 x i64> @uxtb_z_64_no_ptrue(<vscale x 16 x i1> %0, <vscale x 2 x i64> %1) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxtb_z_64_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP3:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT:    [[TMP4:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> [[TMP3]], <vscale x 2 x i64> [[TMP1]])
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP4]]
+;
+  %3 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0)
+  %4 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> %3, <vscale x 2 x i64> %1)
+  ret <vscale x 2 x i64> %4
+}
+
+define <vscale x 2 x i64> @uxtb_m_64_no_ptrue(<vscale x 16 x i1> %0, <vscale x 2 x i64> %1, <vscale x 2 x i64> %2) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxtb_m_64_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]], <vscale x 2 x i64> [[TMP2:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP4:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT:    [[TMP5:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> [[TMP2]], <vscale x 2 x i1> [[TMP4]], <vscale x 2 x i64> [[TMP1]])
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP5]]
+;
+  %4 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0)
+  %5 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> %2, <vscale x 2 x i1> %4, <vscale x 2 x i64> %1)
+  ret <vscale x 2 x i64> %5
+}
+
+define <vscale x 4 x i32> @uxtb_z_32(<vscale x 4 x i32> %0) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxtb_z_32(
+; CHECK-SAME: <vscale x 4 x i32> [[TMP0:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP2:%.*]] = and <vscale x 4 x i32> [[TMP0]], splat (i32 255)
+; CHECK-NEXT:    ret <vscale x 4 x i32> [[TMP2]]
+;
+  %2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxtb.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i1> splat (i1 true), <vscale x 4 x i32> %0)
+  ret <vscale x 4 x i32> %2
+}
+
+define <vscale x 4 x i32> @uxtb_m_32(<vscale x 4 x i32> %0, <vscale x 4 x i32> %1) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxtb_m_32(
+; CHECK-SAME: <vscale x 4 x i32> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP3:%.*]] = and <vscale x 4 x i32> [[TMP0]], splat (i32 255)
+; CHECK-NEXT:    ret <vscale x 4 x i32> [[TMP3]]
+;
+  %3 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxtb.nxv4i32(<vscale x 4 x i32> %1, <vscale x 4 x i1> splat (i1 true), <vscale x 4 x i32> %0)
+  ret <vscale x 4 x i32> %3
+}
+
+define <vscale x 4 x i32> @uxtb_x_32(<vscale x 16 x i1> %0, <vscale x 4 x i32> %1) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxtb_x_32(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP4:%.*]] = and <vscale x 4 x i32> [[TMP1]], splat (i32 255)
+; CHECK-NEXT:    ret <vscale x 4 x i32> [[TMP4]]
+;
+  %3 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %0)
+  %4 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxtb.nxv4i32(<vscale x 4 x i32> undef, <vscale x 4 x i1> %3, <vscale x 4 x i32> %1)
+  ret <vscale x 4 x i32> %4
+}
+
+define <vscale x 4 x i32> @uxtb_z_32_no_ptrue(<vscale x 16 x i1> %0, <vscale x 4 x i32> %1) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxtb_z_32_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP3:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT:    [[TMP4:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxtb.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i1> [[TMP3]], <vscale x 4 x i32> [[TMP1]])
+; CHECK-NEXT:    ret <vscale x 4 x i32> [[TMP4]]
+;
+  %3 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %0)
+  %4 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxtb.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i1> %3, <vscale x 4 x i32> %1)
+  ret <vscale x 4 x i32> %4
+}
+
+define <vscale x 4 x i32> @uxtb_m_32_no_ptrue(<vscale x 16 x i1> %0, <vscale x 4 x i32> %1, <vscale x 4 x i32> %2) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxtb_m_32_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]], <vscale x 4 x i32> [[TMP2:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP4:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT:    [[TMP5:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxtb.nxv4i32(<vscale x 4 x i32> [[TMP2]], <vscale x 4 x i1> [[TMP4]], <vscale x 4 x i32> [[TMP1]])
+; CHECK-NEXT:    ret <vscale x 4 x i32> [[TMP5]]
+;
+  %4 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %0)
+  %5 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxtb.nxv4i32(<vscale x 4 x i32> %2, <vscale x 4 x i1> %4, <vscale x 4 x i32> %1)
+  ret <vscale x 4 x i32> %5
+}
+
+define <vscale x 8 x i16> @uxtb_z_16(<vscale x 8 x i16> %0) #0 {
+; CHECK-LABEL: define <vscale x 8 x i16> @uxtb_z_16(
+; CHECK-SAME: <vscale x 8 x i16> [[TMP0:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP2:%.*]] = and <vscale x 8 x i16> [[TMP0]], splat (i16 255)
+; CHECK-NEXT:    ret <vscale x 8 x i16> [[TMP2]]
+;
+  %2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uxtb.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i1> splat (i1 true), <vscale x 8 x i16> %0)
+  ret <vscale x 8 x i16> %2
+}
+
+define <vscale x 8 x i16> @uxtb_m_16(<vscale x 8 x i16> %0, <vscale x 8 x i16> %1) #0 {
+; CHECK-LABEL: define <vscale x 8 x i16> @uxtb_m_16(
+; CHECK-SAME: <vscale x 8 x i16> [[TMP0:%.*]], <vscale x 8 x i16> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP3:%.*]] = and <vscale x 8 x i16> [[TMP0]], splat (i16 255)
+; CHECK-NEXT:    ret <vscale x 8 x i16> [[TMP3]]
+;
+  %3 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uxtb.nxv8i16(<vscale x 8 x i16> %1, <vscale x 8 x i1> splat (i1 true), <vscale x 8 x i16> %0)
+  ret <vscale x 8 x i16> %3
+}
+
+define <vscale x 8 x i16> @uxtb_x_16(<vscale x 16 x i1> %0, <vscale x 8 x i16> %1) #0 {
+; CHECK-LABEL: define <vscale x 8 x i16> @uxtb_x_16(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 8 x i16> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP4:%.*]] = and <vscale x 8 x i16> [[TMP1]], splat (i16 255)
+; CHECK-NEXT:    ret <vscale x 8 x i16> [[TMP4]]
+;
+  %3 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %0)
+  %4 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uxtb.nxv8i16(<vscale x 8 x i16> undef, <vscale x 8 x i1> %3, <vscale x 8 x i16> %1)
+  ret <vscale x 8 x i16> %4
+}
+
+define <vscale x 8 x i16> @uxtb_z_16_no_ptrue(<vscale x 16 x i1> %0, <vscale x 8 x i16> %1) #0 {
+; CHECK-LABEL: define <vscale x 8 x i16> @uxtb_z_16_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 8 x i16> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP3:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT:    [[TMP4:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uxtb.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i1> [[TMP3]], <vscale x 8 x i16> [[TMP1]])
+; CHECK-NEXT:    ret <vscale x 8 x i16> [[TMP4]]
+;
+  %3 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %0)
+  %4 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uxtb.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i1> %3, <vscale x 8 x i16> %1)
+  ret <vscale x 8 x i16> %4
+}
+
+define <vscale x 8 x i16> @uxtb_m_16_no_ptrue(<vscale x 16 x i1> %0, <vscale x 8 x i16> %1, <vscale x 8 x i16> %2) #0 {
+; CHECK-LABEL: define <vscale x 8 x i16> @uxtb_m_16_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 8 x i16> [[TMP1:%.*]], <vscale x 8 x i16> [[TMP2:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP4:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT:    [[TMP5:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uxtb.nxv8i16(<vscale x 8 x i16> [[TMP2]], <vscale x 8 x i1> [[TMP4]], <vscale x 8 x i16> [[TMP1]])
+; CHECK-NEXT:    ret <vscale x 8 x i16> [[TMP5]]
+;
+  %4 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %0)
+  %5 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uxtb.nxv8i16(<vscale x 8 x i16> %2, <vscale x 8 x i1> %4, <vscale x 8 x i16> %1)
+  ret <vscale x 8 x i16> %5
+}
+
+define <vscale x 2 x i64> @uxth_z_64(<vscale x 2 x i64> %0) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxth_z_64(
+; CHECK-SAME: <vscale x 2 x i64> [[TMP0:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP2:%.*]] = and <vscale x 2 x i64> [[TMP0]], splat (i64 65535)
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP2]]
+;
+  %2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxth.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> splat (i1 true), <vscale x 2 x i64> %0)
+  ret <vscale x 2 x i64> %2
+}
+
+define <vscale x 2 x i64> @uxth_m_64(<vscale x 2 x i64> %0, <vscale x 2 x i64> %1) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxth_m_64(
+; CHECK-SAME: <vscale x 2 x i64> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP3:%.*]] = and <vscale x 2 x i64> [[TMP0]], splat (i64 65535)
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP3]]
+;
+  %3 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxth.nxv2i64(<vscale x 2 x i64> %1, <vscale x 2 x i1> splat (i1 true), <vscale x 2 x i64> %0)
+  ret <vscale x 2 x i64> %3
+}
+
+define <vscale x 2 x i64> @uxth_x_64(<vscale x 16 x i1> %0, <vscale x 2 x i64> %1) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxth_x_64(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP4:%.*]] = and <vscale x 2 x i64> [[TMP1]], splat (i64 65535)
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP4]]
+;
+  %3 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0)
+  %4 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxth.nxv2i64(<vscale x 2 x i64> undef, <vscale x 2 x i1> %3, <vscale x 2 x i64> %1)
+  ret <vscale x 2 x i64> %4
+}
+
+define <vscale x 2 x i64> @uxth_z_64_no_ptrue(<vscale x 16 x i1> %0, <vscale x 2 x i64> %1) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxth_z_64_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP3:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT:    [[TMP4:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxth.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> [[TMP3]], <vscale x 2 x i64> [[TMP1]])
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP4]]
+;
+  %3 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0)
+  %4 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxth.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> %3, <vscale x 2 x i64> %1)
+  ret <vscale x 2 x i64> %4
+}
+
+define <vscale x 2 x i64> @uxth_m_64_no_ptrue(<vscale x 16 x i1> %0, <vscale x 2 x i64> %1, <vscale x 2 x i64> %2) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxth_m_64_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]], <vscale x 2 x i64> [[TMP2:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP4:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT:    [[TMP5:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxth.nxv2i64(<vscale x 2 x i64> [[TMP2]], <vscale x 2 x i1> [[TMP4]], <vscale x 2 x i64> [[TMP1]])
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP5]]
+;
+  %4 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0)
+  %5 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxth.nxv2i64(<vscale x 2 x i64> %2, <vscale x 2 x i1> %4, <vscale x 2 x i64> %1)
+  ret <vscale x 2 x i64> %5
+}
+
+define <vscale x 4 x i32> @uxth_z_32(<vscale x 4 x i32> %0) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxth_z_32(
+; CHECK-SAME: <vscale x 4 x i32> [[TMP0:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP2:%.*]] = and <vscale x 4 x i32> [[TMP0]], splat (i32 65535)
+; CHECK-NEXT:    ret <vscale x 4 x i32> [[TMP2]]
+;
+  %2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxth.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i1> splat (i1 true), <vscale x 4 x i32> %0)
+  ret <vscale x 4 x i32> %2
+}
+
+define <vscale x 4 x i32> @uxth_m_32(<vscale x 4 x i32> %0, <vscale x 4 x i32> %1) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxth_m_32(
+; CHECK-SAME: <vscale x 4 x i32> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP3:%.*]] = and <vscale x 4 x i32> [[TMP0]], splat (i32 65535)
+; CHECK-NEXT:    ret <vscale x 4 x i32> [[TMP3]]
+;
+  %3 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxth.nxv4i32(<vscale x 4 x i32> %1, <vscale x 4 x i1> splat (i1 true), <vscale x 4 x i32> %0)
+  ret <vscale x 4 x i32> %3
+}
+
+define <vscale x 4 x i32> @uxth_x_32(<vscale x 16 x i1> %0, <vscale x 4 x i32> %1) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxth_x_32(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP4:%.*]] = and <vscale x 4 x i32> [[TMP1]], splat (i32 65535)
+; CHECK-NEXT:    ret <vscale x 4 x i32> [[TMP4]]
+;
+  %3 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %0)
+  %4 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxth.nxv4i32(<vscale x 4 x i32> undef, <vscale x 4 x i1> %3, <vscale x 4 x i32> %1)
+  ret <vscale x 4 x i32> %4
+}
+
+define <vscale x 4 x i32> @uxth_z_32_no_ptrue(<vscale x 16 x i1> %0, <vscale x 4 x i32> %1) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxth_z_32_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP3:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT:    [[TMP4:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxth.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i1> [[TMP3]], <vscale x 4 x i32> [[TMP1]])
+; CHECK-NEXT:    ret <vscale x 4 x i32> [[TMP4]]
+;
+  %3 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %0)
+  %4 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxth.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i1> %3, <vscale x 4 x i32> %1)
+  ret <vscale x 4 x i32> %4
+}
+
+define <vscale x 4 x i32> @uxth_m_32_no_ptrue(<vscale x 16 x i1> %0, <vscale x 4 x i32> %1, <vscale x 4 x i32> %2) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxth_m_32_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]], <vscale x 4 x i32> [[TMP2:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP4:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT:    [[TMP5:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxth.nxv4i32(<vscale x 4 x i32> [[TMP2]], <vscale x 4 x i1> [[TMP4]], <vscale x 4 x i32> [[TMP1]])
+; CHECK-NEXT:    ret <vscale x 4 x i32> [[TMP5]]
+;
+  %4 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %0)
+  %5 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxth.nxv4i32(<vscale x 4 x i32> %2, <vscale x 4 x i1> %4, <vscale x 4 x i32> %1)
+  ret <vscale x 4 x i32> %5
+}
+
+define <vscale x 2 x i64> @uxtw_z_64(<vscale x 2 x i64> %0) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxtw_z_64(
+; CHECK-SAME: <vscale x 2 x i64> [[TMP0:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[TMP2:%.*]] = and <vs...
[truncated]

github-actions · 2025-04-30T12:34:34Z

⚠️ undef deprecator found issues in your code. ⚠️

You can test this locally with the following command:

git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 'HEAD~1' HEAD llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

The following files introduce new uses of undef:

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll

Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields undef. You should use poison values for placeholders instead.

In tests, avoid using undef and having tests that trigger undefined behavior. If you need an operand with some unimportant value, you can add a new argument to the function and use that instead.

For example, this is considered a bad practice:

define void @fn() {
  ...
  br i1 undef, ...
}

Please use the following instead:

define void @fn(i1 %cond) {
  ...
  br i1 %cond, ...
}

Please refer to the Undefined Behavior Manual for more information.

github-actions · 2025-04-30T12:34:34Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff HEAD~1 HEAD --extensions cpp -- llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

View the diff from clang-format here.

diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index e1432d178..a1575be18 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -2702,9 +2702,8 @@ static std::optional<Instruction *> instCombinePTrue(InstCombiner &IC,
   return std::nullopt;
 }
 
-static std::optional<Instruction *> instCombineSVEUxt(InstCombiner &IC,
-                                                      IntrinsicInst &II,
-                                                      unsigned NumBits) {
+static std::optional<Instruction *>
+instCombineSVEUxt(InstCombiner &IC, IntrinsicInst &II, unsigned NumBits) {
   Value *Passthru = II.getOperand(0);
   Value *Pg = II.getOperand(1);
   Value *Op = II.getOperand(2);

rj-jesus · 2025-04-30T12:46:59Z

About the bots:

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll introduces uses of undef because that's how ``don't care'' merging is lowered, e.g.: https://godbolt.org/z/6s3vY5f5x.
The formatting difference seems to be in line with other similar declarations.

Please let me know if I should change either.

paulwalker-arm · 2025-04-30T12:55:34Z

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll introduces uses of undef because that's how ``don't care'' merging is lowered, e.g.: https://godbolt.org/z/6s3vY5f5x.

FYI: I have been investigating whether we can replace these uses of undef with poison but it's not entirely straightforward. I have a plan but there are several pieces that need completing before I can change clang because otherwise we're likely to see regression in generated code.

rj-jesus · 2025-04-30T13:24:23Z

FYI: I have been investigating whether we can replace these uses of undef with poison but it's not entirely straightforward. I have a plan but there are several pieces that need completing before I can change clang because otherwise we're likely to see regression in generated code.

Thanks for the heads-up! I can imagine it won't be easy to make sure everything still plays nicely.
As things stand, I don't think there's another way of testing ACLE's ``don't care'' merging other than with undef values as in this patch, right?

paulwalker-arm · 2025-04-30T13:28:19Z

FYI: I have been investigating whether we can replace these uses of undef with poison but it's not entirely straightforward. I have a plan but there are several pieces that need completing before I can change clang because otherwise we're likely to see regression in generated code.

Thanks for the heads-up! I can imagine it won't be easy to make sure everything still plays nicely. As things stand, I don't think there's another way of testing ACLE's ``don't care'' merging other than with undef values as in this patch, right?

Correct. This is one of the rare cases where we have to ignore the formatter errors.

paulwalker-arm · 2025-04-30T13:29:15Z

This could be unfounded paranoia on my side but I've generally tried to avoid loosing information about which lanes are active because I figure it may be useful later on. This prompts two thoughts, one easy the other more involved:

What about emitting an sve.and_u intrinsic so we maintain the predicate knowledge. When this gets to ISel it'll still emit the desired AND immediate instruction.
Should we canonicalise all sve.uxt intrinsics to sve.and? You'll still get the output you want but we will need ISEL patterns to emit UXT[B,H,S,B] when desirable (i.e. for sve.and(non-undef, no-all-active-predicate, mask).

rj-jesus · 2025-04-30T14:21:50Z

Correct. This is one of the rare cases where we have to ignore the formatter errors.

That makes sense, thanks! :)

This could be unfounded paranoia on my side but I've generally tried to avoid loosing information about which lanes are active because I figure it may be useful later on. This prompts two thoughts, one easy the other more involved:

What about emitting an sve.and_u intrinsic so we maintain the predicate knowledge. When this gets to ISel it'll still emit the desired AND immediate instruction.

Should we canonicalise all sve.uxt intrinsics to sve.and? You'll still get the output you want but we will need ISEL patterns to emit UXT[B,H,S,B] when desirable (i.e. for sve.and(non-undef, no-all-active-predicate, mask).

I see what you mean, and I'm okay with either option. Do you have a preference?

I guess option (2) sounds more scalable and gives us a chance to improve the lowering for svand_m with all-ones 8/16/32-bit masks (link), but we could equally address it separately.

paulwalker-arm · 2025-04-30T14:31:06Z

Option (2) is my preference because once complete we can practically ignore sve.uxt from all future transformations.

rj-jesus · 2025-04-30T14:49:12Z

Option (2) is my preference because once complete we can practically ignore sve.uxt from all future transformations.

I agree, although thinking about it a bit more, can we really canonicalise all sve.uxt to sve.and? More specifically, how do we encode the passthrough of the UXT in the AND? The latter doesn't seem to have a real passthrough, in the sense that inactive elements are taken from one of the operands. Do you see what I mean? Or are you suggesting converting the UXT to a sequence of SEL+AND (i.e. something like uxt (pg, inactive, op) -> sel (pg, (and pg, op, mask), inactive))? (Sorry if I'm missing something obvious.)

paulwalker-arm · 2025-04-30T14:55:20Z

Option (2) is my preference because once complete we can practically ignore sve.uxt from all future transformations.

I agree, although thinking about it a bit more, can we really canonicalise all sve.uxt to sve.and? More specifically, how do we encode the passthrough of the UXT in the AND? The latter doesn't seem to have a real passthrough, in the sense that inactive elements are taken from one of the operands. Do you see what I mean? Or are you suggesting converting the UXT to a sequence of SEL+AND (i.e. something like uxt (pg, inactive, op) -> sel (pg, (and pg, op, mask), inactive))? (Sorry if I'm missing something obvious.)

Darn it! I forgot the unary operations don't use the same operand for data and passthrough :( Sorry for the misdirect. Let's stick with option (1) and pretend this never happened :)

rj-jesus · 2025-04-30T14:58:39Z

Darn it! I forgot the unary operations don't use the same operand for data and passthrough :( Sorry for the misdirect. Let's stick with option (1) and pretend this never happened :)

Aah, that makes sense! Thanks for confirming, I'll update the PR shortly 😄

paulwalker-arm · 2025-05-01T11:52:53Z

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

+    auto *Mask = ConstantVector::getSplat(
+        Ty->getElementCount(),
+        ConstantInt::get(Ty->getElementType(), MaskValue));


Suggested change

auto *Mask = ConstantVector::getSplat(

Ty->getElementCount(),

ConstantInt::get(Ty->getElementType(), MaskValue));

auto *Mask = ConstantInt::get(Ty, MaskValue);

Ups, thanks - done.

paulwalker-arm · 2025-05-01T12:24:56Z

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll

+define <vscale x 2 x i64> @uxtb_z_64(<vscale x 2 x i64> %0) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxtb_z_64(
+; CHECK-SAME: <vscale x 2 x i64> [[TMP0:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:    [[TMP2:%.*]] = call <vscale x 2 x i64> @llvm.aarch64.sve.and.u.nxv2i64(<vscale x 2 x i1> splat (i1 true), <vscale x 2 x i64> [[TMP0]], <vscale x 2 x i64> splat (i64 255))
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP2]]
+;
+  %2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> splat (i1 true), <vscale x 2 x i64> %0)
+  ret <vscale x 2 x i64> %2
+}


Do any of the _z_ tests add value? The new combine doesn't care about the passthrough value, only whether it is undef or not.

The _z_ tests aside, it's up to you but I feel many of the other tests can also be removed. I think you only need to test a single element type for the core functionality (i.e. positive and negative tests relating to whether the combine is applied) and then a single test per element type to show the correct and_u equivalent is created.

Thanks, that makes sense, I've cleaned up the tests a bit.
(I had auto-generated them from ACLE code without giving it proper thought.)

paulwalker-arm · 2025-05-01T16:51:46Z

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll

+; CHECK-NEXT:    [[TMP4:%.*]] = call <vscale x 2 x i64> @llvm.aarch64.sve.and.u.nxv2i64(<vscale x 2 x i1> [[TMP3]], <vscale x 2 x i64> [[TMP1]], <vscale x 2 x i64> splat (i64 255))
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP4]]
+;
+  %3 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0)


You can pass <vscale x 2 x i1> directly as a parameter and then you'll not need the convert call.

Thanks, sorted.

paulwalker-arm · 2025-05-01T16:51:54Z

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll

+; CHECK-NEXT:    [[TMP5:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> [[TMP2]], <vscale x 2 x i1> [[TMP4]], <vscale x 2 x i64> [[TMP1]])
+; CHECK-NEXT:    ret <vscale x 2 x i64> [[TMP5]]
+;
+  %4 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0)


This patch combines uxt[bhw] intrinsics to and_u when the governing predicate is all-true or the passthrough is undef (e.g. in cases of ``unknown'' merging). This improves code gen as the latter can be emitted as AND immediate instructions. For example, given: ```cpp svuint64_t foo(svuint64_t x) { return svextb_z(svptrue_b64(), x); } ``` Currently: ```gas foo: ptrue p0.d movi v1.2d, #0000000000000000 uxtb z0.d, p0/m, z0.d ret ``` Becomes: ```gas foo: and z0.d, z0.d, #0xff ret ```

rj-jesus added 2 commits April 30, 2025 02:41

Precommit tests.

ed0ee9a

rj-jesus requested review from davemgreen, paulwalker-arm and david-arm April 30, 2025 12:28

llvmbot added backend:AArch64 llvm:instcombine llvm:transforms labels Apr 30, 2025

Emit Intrinsic::aarch64_sve_and_u rather than stock and.

89ce3d8

paulwalker-arm reviewed May 1, 2025

View reviewed changes

rj-jesus added 3 commits May 1, 2025 06:08

Simplify splat creation.

f0b3424

Remove zeroing tests.

34f4dc0

Remove redundant tests.

dcf54e2

paulwalker-arm approved these changes May 1, 2025

View reviewed changes

Remove convert calls from tests.

ef535bf

rj-jesus merged commit fbd9a31 into llvm:main May 6, 2025
10 of 11 checks passed

rj-jesus deleted the rjj/sve-fold-ptrue-uxt-to-and branch May 6, 2025 07:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64][SVE] Combine UXT[BHW] intrinsics to AND. #137956

[AArch64][SVE] Combine UXT[BHW] intrinsics to AND. #137956

rj-jesus commented Apr 30, 2025

llvmbot commented Apr 30, 2025 •

edited

Loading

github-actions bot commented Apr 30, 2025

github-actions bot commented Apr 30, 2025 •

edited

Loading

rj-jesus commented Apr 30, 2025

paulwalker-arm commented Apr 30, 2025

rj-jesus commented Apr 30, 2025

paulwalker-arm commented Apr 30, 2025

paulwalker-arm commented Apr 30, 2025

rj-jesus commented Apr 30, 2025

paulwalker-arm commented Apr 30, 2025 •

edited

Loading

rj-jesus commented Apr 30, 2025 •

edited

Loading

paulwalker-arm commented Apr 30, 2025 •

edited

Loading

rj-jesus commented Apr 30, 2025

paulwalker-arm May 1, 2025

rj-jesus May 1, 2025

paulwalker-arm May 1, 2025

rj-jesus May 1, 2025

paulwalker-arm May 1, 2025

rj-jesus May 1, 2025

paulwalker-arm May 1, 2025

[AArch64][SVE] Combine UXT[BHW] intrinsics to AND. #137956

[AArch64][SVE] Combine UXT[BHW] intrinsics to AND. #137956

Conversation

rj-jesus commented Apr 30, 2025

llvmbot commented Apr 30, 2025 • edited Loading

github-actions bot commented Apr 30, 2025

github-actions bot commented Apr 30, 2025 • edited Loading

rj-jesus commented Apr 30, 2025

paulwalker-arm commented Apr 30, 2025

rj-jesus commented Apr 30, 2025

paulwalker-arm commented Apr 30, 2025

paulwalker-arm commented Apr 30, 2025

rj-jesus commented Apr 30, 2025

paulwalker-arm commented Apr 30, 2025 • edited Loading

rj-jesus commented Apr 30, 2025 • edited Loading

paulwalker-arm commented Apr 30, 2025 • edited Loading

rj-jesus commented Apr 30, 2025

paulwalker-arm May 1, 2025

Choose a reason for hiding this comment

rj-jesus May 1, 2025

Choose a reason for hiding this comment

paulwalker-arm May 1, 2025

Choose a reason for hiding this comment

rj-jesus May 1, 2025

Choose a reason for hiding this comment

paulwalker-arm May 1, 2025

Choose a reason for hiding this comment

rj-jesus May 1, 2025

Choose a reason for hiding this comment

paulwalker-arm May 1, 2025

Choose a reason for hiding this comment

llvmbot commented Apr 30, 2025 •

edited

Loading

github-actions bot commented Apr 30, 2025 •

edited

Loading

paulwalker-arm commented Apr 30, 2025 •

edited

Loading

rj-jesus commented Apr 30, 2025 •

edited

Loading

paulwalker-arm commented Apr 30, 2025 •

edited

Loading