-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[AArch64][SVE] Combine UXT[BHW] intrinsics to AND. #137956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently, we lower UXT[BHW] intrinsics into the corresponding UXT* instruction. However, when the governing predicate is all-true or the passthrough is undef (e.g. in the case of ``don't care'' merging), we can lower them into AND immediate instructions instead. For example: ```cpp svuint64_t foo_z(svuint64_t x) { return svextb_z(svptrue_b64(), x); } ``` Currently: ``` foo_z: ptrue p0.d movi v1.2d, #0000000000000000 uxtb z0.d, p0/m, z0.d ret ``` Becomes: ``` foo_z: and z0.d, z0.d, #0xff ret ``` We do this early in InstCombine in case it unblocks other simplifications.
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-aarch64 Author: Ricardo Jesus (rj-jesus) ChangesCurrently, we lower UXT[BHW] intrinsics into the corresponding UXT* For example, given: svuint64_t foo(svuint64_t x) {
return svextb_z(svptrue_b64(), x);
} Currently: foo:
ptrue p0.d
movi v1.2d, #<!-- -->0000000000000000
uxtb z0.d, p0/m, z0.d
ret Becomes: foo:
and z0.d, z0.d, #<!-- -->0xff
ret I implemented this in InstCombine in case it unblocks other Patch is 23.35 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/137956.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 594f1bff5c458..e9050d184f0f7 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -2640,6 +2640,26 @@ static std::optional<Instruction *> instCombinePTrue(InstCombiner &IC,
return std::nullopt;
}
+static std::optional<Instruction *> instCombineSVEUxt(InstCombiner &IC,
+ IntrinsicInst &II,
+ unsigned NumBits) {
+ Value *Passthru = II.getOperand(0);
+ Value *Pg = II.getOperand(1);
+ Value *Op = II.getOperand(2);
+
+ // Convert UXT[BHW] to AND.
+ if (isa<UndefValue>(Passthru) || isAllActivePredicate(Pg)) {
+ auto *Ty = cast<VectorType>(II.getType());
+ auto MaskValue = APInt::getLowBitsSet(Ty->getScalarSizeInBits(), NumBits);
+ auto *Mask = ConstantVector::getSplat(
+ Ty->getElementCount(),
+ ConstantInt::get(Ty->getElementType(), MaskValue));
+ return IC.replaceInstUsesWith(II, IC.Builder.CreateAnd(Op, Mask));
+ }
+
+ return std::nullopt;
+}
+
std::optional<Instruction *>
AArch64TTIImpl::instCombineIntrinsic(InstCombiner &IC,
IntrinsicInst &II) const {
@@ -2745,6 +2765,12 @@ AArch64TTIImpl::instCombineIntrinsic(InstCombiner &IC,
return instCombineSVEInsr(IC, II);
case Intrinsic::aarch64_sve_ptrue:
return instCombinePTrue(IC, II);
+ case Intrinsic::aarch64_sve_uxtb:
+ return instCombineSVEUxt(IC, II, 8);
+ case Intrinsic::aarch64_sve_uxth:
+ return instCombineSVEUxt(IC, II, 16);
+ case Intrinsic::aarch64_sve_uxtw:
+ return instCombineSVEUxt(IC, II, 32);
}
return std::nullopt;
diff --git a/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll b/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll
new file mode 100644
index 0000000000000..86986b510aa27
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll
@@ -0,0 +1,336 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -passes=instcombine < %s | FileCheck %s
+
+target triple = "aarch64-unknown-linux-gnu"
+
+define <vscale x 2 x i64> @uxtb_z_64(<vscale x 2 x i64> %0) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxtb_z_64(
+; CHECK-SAME: <vscale x 2 x i64> [[TMP0:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT: [[TMP2:%.*]] = and <vscale x 2 x i64> [[TMP0]], splat (i64 255)
+; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
+;
+ %2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> splat (i1 true), <vscale x 2 x i64> %0)
+ ret <vscale x 2 x i64> %2
+}
+
+define <vscale x 2 x i64> @uxtb_m_64(<vscale x 2 x i64> %0, <vscale x 2 x i64> %1) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxtb_m_64(
+; CHECK-SAME: <vscale x 2 x i64> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP3:%.*]] = and <vscale x 2 x i64> [[TMP0]], splat (i64 255)
+; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP3]]
+;
+ %3 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> %1, <vscale x 2 x i1> splat (i1 true), <vscale x 2 x i64> %0)
+ ret <vscale x 2 x i64> %3
+}
+
+define <vscale x 2 x i64> @uxtb_x_64(<vscale x 16 x i1> %0, <vscale x 2 x i64> %1) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxtb_x_64(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP4:%.*]] = and <vscale x 2 x i64> [[TMP1]], splat (i64 255)
+; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP4]]
+;
+ %3 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0)
+ %4 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> undef, <vscale x 2 x i1> %3, <vscale x 2 x i64> %1)
+ ret <vscale x 2 x i64> %4
+}
+
+define <vscale x 2 x i64> @uxtb_z_64_no_ptrue(<vscale x 16 x i1> %0, <vscale x 2 x i64> %1) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxtb_z_64_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP3:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT: [[TMP4:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> [[TMP3]], <vscale x 2 x i64> [[TMP1]])
+; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP4]]
+;
+ %3 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0)
+ %4 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> %3, <vscale x 2 x i64> %1)
+ ret <vscale x 2 x i64> %4
+}
+
+define <vscale x 2 x i64> @uxtb_m_64_no_ptrue(<vscale x 16 x i1> %0, <vscale x 2 x i64> %1, <vscale x 2 x i64> %2) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxtb_m_64_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]], <vscale x 2 x i64> [[TMP2:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP4:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT: [[TMP5:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> [[TMP2]], <vscale x 2 x i1> [[TMP4]], <vscale x 2 x i64> [[TMP1]])
+; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP5]]
+;
+ %4 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0)
+ %5 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> %2, <vscale x 2 x i1> %4, <vscale x 2 x i64> %1)
+ ret <vscale x 2 x i64> %5
+}
+
+define <vscale x 4 x i32> @uxtb_z_32(<vscale x 4 x i32> %0) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxtb_z_32(
+; CHECK-SAME: <vscale x 4 x i32> [[TMP0:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP2:%.*]] = and <vscale x 4 x i32> [[TMP0]], splat (i32 255)
+; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
+;
+ %2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxtb.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i1> splat (i1 true), <vscale x 4 x i32> %0)
+ ret <vscale x 4 x i32> %2
+}
+
+define <vscale x 4 x i32> @uxtb_m_32(<vscale x 4 x i32> %0, <vscale x 4 x i32> %1) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxtb_m_32(
+; CHECK-SAME: <vscale x 4 x i32> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP3:%.*]] = and <vscale x 4 x i32> [[TMP0]], splat (i32 255)
+; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP3]]
+;
+ %3 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxtb.nxv4i32(<vscale x 4 x i32> %1, <vscale x 4 x i1> splat (i1 true), <vscale x 4 x i32> %0)
+ ret <vscale x 4 x i32> %3
+}
+
+define <vscale x 4 x i32> @uxtb_x_32(<vscale x 16 x i1> %0, <vscale x 4 x i32> %1) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxtb_x_32(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP4:%.*]] = and <vscale x 4 x i32> [[TMP1]], splat (i32 255)
+; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP4]]
+;
+ %3 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %0)
+ %4 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxtb.nxv4i32(<vscale x 4 x i32> undef, <vscale x 4 x i1> %3, <vscale x 4 x i32> %1)
+ ret <vscale x 4 x i32> %4
+}
+
+define <vscale x 4 x i32> @uxtb_z_32_no_ptrue(<vscale x 16 x i1> %0, <vscale x 4 x i32> %1) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxtb_z_32_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP3:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT: [[TMP4:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxtb.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i1> [[TMP3]], <vscale x 4 x i32> [[TMP1]])
+; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP4]]
+;
+ %3 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %0)
+ %4 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxtb.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i1> %3, <vscale x 4 x i32> %1)
+ ret <vscale x 4 x i32> %4
+}
+
+define <vscale x 4 x i32> @uxtb_m_32_no_ptrue(<vscale x 16 x i1> %0, <vscale x 4 x i32> %1, <vscale x 4 x i32> %2) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxtb_m_32_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]], <vscale x 4 x i32> [[TMP2:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP4:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT: [[TMP5:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxtb.nxv4i32(<vscale x 4 x i32> [[TMP2]], <vscale x 4 x i1> [[TMP4]], <vscale x 4 x i32> [[TMP1]])
+; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP5]]
+;
+ %4 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %0)
+ %5 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxtb.nxv4i32(<vscale x 4 x i32> %2, <vscale x 4 x i1> %4, <vscale x 4 x i32> %1)
+ ret <vscale x 4 x i32> %5
+}
+
+define <vscale x 8 x i16> @uxtb_z_16(<vscale x 8 x i16> %0) #0 {
+; CHECK-LABEL: define <vscale x 8 x i16> @uxtb_z_16(
+; CHECK-SAME: <vscale x 8 x i16> [[TMP0:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP2:%.*]] = and <vscale x 8 x i16> [[TMP0]], splat (i16 255)
+; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP2]]
+;
+ %2 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uxtb.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i1> splat (i1 true), <vscale x 8 x i16> %0)
+ ret <vscale x 8 x i16> %2
+}
+
+define <vscale x 8 x i16> @uxtb_m_16(<vscale x 8 x i16> %0, <vscale x 8 x i16> %1) #0 {
+; CHECK-LABEL: define <vscale x 8 x i16> @uxtb_m_16(
+; CHECK-SAME: <vscale x 8 x i16> [[TMP0:%.*]], <vscale x 8 x i16> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP3:%.*]] = and <vscale x 8 x i16> [[TMP0]], splat (i16 255)
+; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP3]]
+;
+ %3 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uxtb.nxv8i16(<vscale x 8 x i16> %1, <vscale x 8 x i1> splat (i1 true), <vscale x 8 x i16> %0)
+ ret <vscale x 8 x i16> %3
+}
+
+define <vscale x 8 x i16> @uxtb_x_16(<vscale x 16 x i1> %0, <vscale x 8 x i16> %1) #0 {
+; CHECK-LABEL: define <vscale x 8 x i16> @uxtb_x_16(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 8 x i16> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP4:%.*]] = and <vscale x 8 x i16> [[TMP1]], splat (i16 255)
+; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP4]]
+;
+ %3 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %0)
+ %4 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uxtb.nxv8i16(<vscale x 8 x i16> undef, <vscale x 8 x i1> %3, <vscale x 8 x i16> %1)
+ ret <vscale x 8 x i16> %4
+}
+
+define <vscale x 8 x i16> @uxtb_z_16_no_ptrue(<vscale x 16 x i1> %0, <vscale x 8 x i16> %1) #0 {
+; CHECK-LABEL: define <vscale x 8 x i16> @uxtb_z_16_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 8 x i16> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP3:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT: [[TMP4:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uxtb.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i1> [[TMP3]], <vscale x 8 x i16> [[TMP1]])
+; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP4]]
+;
+ %3 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %0)
+ %4 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uxtb.nxv8i16(<vscale x 8 x i16> zeroinitializer, <vscale x 8 x i1> %3, <vscale x 8 x i16> %1)
+ ret <vscale x 8 x i16> %4
+}
+
+define <vscale x 8 x i16> @uxtb_m_16_no_ptrue(<vscale x 16 x i1> %0, <vscale x 8 x i16> %1, <vscale x 8 x i16> %2) #0 {
+; CHECK-LABEL: define <vscale x 8 x i16> @uxtb_m_16_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 8 x i16> [[TMP1:%.*]], <vscale x 8 x i16> [[TMP2:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP4:%.*]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT: [[TMP5:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uxtb.nxv8i16(<vscale x 8 x i16> [[TMP2]], <vscale x 8 x i1> [[TMP4]], <vscale x 8 x i16> [[TMP1]])
+; CHECK-NEXT: ret <vscale x 8 x i16> [[TMP5]]
+;
+ %4 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %0)
+ %5 = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uxtb.nxv8i16(<vscale x 8 x i16> %2, <vscale x 8 x i1> %4, <vscale x 8 x i16> %1)
+ ret <vscale x 8 x i16> %5
+}
+
+define <vscale x 2 x i64> @uxth_z_64(<vscale x 2 x i64> %0) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxth_z_64(
+; CHECK-SAME: <vscale x 2 x i64> [[TMP0:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP2:%.*]] = and <vscale x 2 x i64> [[TMP0]], splat (i64 65535)
+; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]]
+;
+ %2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxth.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> splat (i1 true), <vscale x 2 x i64> %0)
+ ret <vscale x 2 x i64> %2
+}
+
+define <vscale x 2 x i64> @uxth_m_64(<vscale x 2 x i64> %0, <vscale x 2 x i64> %1) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxth_m_64(
+; CHECK-SAME: <vscale x 2 x i64> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP3:%.*]] = and <vscale x 2 x i64> [[TMP0]], splat (i64 65535)
+; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP3]]
+;
+ %3 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxth.nxv2i64(<vscale x 2 x i64> %1, <vscale x 2 x i1> splat (i1 true), <vscale x 2 x i64> %0)
+ ret <vscale x 2 x i64> %3
+}
+
+define <vscale x 2 x i64> @uxth_x_64(<vscale x 16 x i1> %0, <vscale x 2 x i64> %1) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxth_x_64(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP4:%.*]] = and <vscale x 2 x i64> [[TMP1]], splat (i64 65535)
+; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP4]]
+;
+ %3 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0)
+ %4 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxth.nxv2i64(<vscale x 2 x i64> undef, <vscale x 2 x i1> %3, <vscale x 2 x i64> %1)
+ ret <vscale x 2 x i64> %4
+}
+
+define <vscale x 2 x i64> @uxth_z_64_no_ptrue(<vscale x 16 x i1> %0, <vscale x 2 x i64> %1) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxth_z_64_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP3:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT: [[TMP4:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxth.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> [[TMP3]], <vscale x 2 x i64> [[TMP1]])
+; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP4]]
+;
+ %3 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0)
+ %4 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxth.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> %3, <vscale x 2 x i64> %1)
+ ret <vscale x 2 x i64> %4
+}
+
+define <vscale x 2 x i64> @uxth_m_64_no_ptrue(<vscale x 16 x i1> %0, <vscale x 2 x i64> %1, <vscale x 2 x i64> %2) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxth_m_64_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 2 x i64> [[TMP1:%.*]], <vscale x 2 x i64> [[TMP2:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP4:%.*]] = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT: [[TMP5:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxth.nxv2i64(<vscale x 2 x i64> [[TMP2]], <vscale x 2 x i1> [[TMP4]], <vscale x 2 x i64> [[TMP1]])
+; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP5]]
+;
+ %4 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0)
+ %5 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxth.nxv2i64(<vscale x 2 x i64> %2, <vscale x 2 x i1> %4, <vscale x 2 x i64> %1)
+ ret <vscale x 2 x i64> %5
+}
+
+define <vscale x 4 x i32> @uxth_z_32(<vscale x 4 x i32> %0) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxth_z_32(
+; CHECK-SAME: <vscale x 4 x i32> [[TMP0:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP2:%.*]] = and <vscale x 4 x i32> [[TMP0]], splat (i32 65535)
+; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
+;
+ %2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxth.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i1> splat (i1 true), <vscale x 4 x i32> %0)
+ ret <vscale x 4 x i32> %2
+}
+
+define <vscale x 4 x i32> @uxth_m_32(<vscale x 4 x i32> %0, <vscale x 4 x i32> %1) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxth_m_32(
+; CHECK-SAME: <vscale x 4 x i32> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP3:%.*]] = and <vscale x 4 x i32> [[TMP0]], splat (i32 65535)
+; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP3]]
+;
+ %3 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxth.nxv4i32(<vscale x 4 x i32> %1, <vscale x 4 x i1> splat (i1 true), <vscale x 4 x i32> %0)
+ ret <vscale x 4 x i32> %3
+}
+
+define <vscale x 4 x i32> @uxth_x_32(<vscale x 16 x i1> %0, <vscale x 4 x i32> %1) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxth_x_32(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP4:%.*]] = and <vscale x 4 x i32> [[TMP1]], splat (i32 65535)
+; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP4]]
+;
+ %3 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %0)
+ %4 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxth.nxv4i32(<vscale x 4 x i32> undef, <vscale x 4 x i1> %3, <vscale x 4 x i32> %1)
+ ret <vscale x 4 x i32> %4
+}
+
+define <vscale x 4 x i32> @uxth_z_32_no_ptrue(<vscale x 16 x i1> %0, <vscale x 4 x i32> %1) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxth_z_32_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP3:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT: [[TMP4:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxth.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i1> [[TMP3]], <vscale x 4 x i32> [[TMP1]])
+; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP4]]
+;
+ %3 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %0)
+ %4 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxth.nxv4i32(<vscale x 4 x i32> zeroinitializer, <vscale x 4 x i1> %3, <vscale x 4 x i32> %1)
+ ret <vscale x 4 x i32> %4
+}
+
+define <vscale x 4 x i32> @uxth_m_32_no_ptrue(<vscale x 16 x i1> %0, <vscale x 4 x i32> %1, <vscale x 4 x i32> %2) #0 {
+; CHECK-LABEL: define <vscale x 4 x i32> @uxth_m_32_no_ptrue(
+; CHECK-SAME: <vscale x 16 x i1> [[TMP0:%.*]], <vscale x 4 x i32> [[TMP1:%.*]], <vscale x 4 x i32> [[TMP2:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP4:%.*]] = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> [[TMP0]])
+; CHECK-NEXT: [[TMP5:%.*]] = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxth.nxv4i32(<vscale x 4 x i32> [[TMP2]], <vscale x 4 x i1> [[TMP4]], <vscale x 4 x i32> [[TMP1]])
+; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP5]]
+;
+ %4 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %0)
+ %5 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.uxth.nxv4i32(<vscale x 4 x i32> %2, <vscale x 4 x i1> %4, <vscale x 4 x i32> %1)
+ ret <vscale x 4 x i32> %5
+}
+
+define <vscale x 2 x i64> @uxtw_z_64(<vscale x 2 x i64> %0) #0 {
+; CHECK-LABEL: define <vscale x 2 x i64> @uxtw_z_64(
+; CHECK-SAME: <vscale x 2 x i64> [[TMP0:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT: [[TMP2:%.*]] = and <vs...
[truncated]
|
You can test this locally with the following command:git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 'HEAD~1' HEAD llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-uxt.ll llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp The following files introduce new uses of undef:
Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields In tests, avoid using For example, this is considered a bad practice: define void @fn() {
...
br i1 undef, ...
} Please use the following instead: define void @fn(i1 %cond) {
...
br i1 %cond, ...
} Please refer to the Undefined Behavior Manual for more information. |
You can test this locally with the following command:git-clang-format --diff HEAD~1 HEAD --extensions cpp -- llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp View the diff from clang-format here.diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index e1432d178..a1575be18 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -2702,9 +2702,8 @@ static std::optional<Instruction *> instCombinePTrue(InstCombiner &IC,
return std::nullopt;
}
-static std::optional<Instruction *> instCombineSVEUxt(InstCombiner &IC,
- IntrinsicInst &II,
- unsigned NumBits) {
+static std::optional<Instruction *>
+instCombineSVEUxt(InstCombiner &IC, IntrinsicInst &II, unsigned NumBits) {
Value *Passthru = II.getOperand(0);
Value *Pg = II.getOperand(1);
Value *Op = II.getOperand(2);
|
About the bots:
Please let me know if I should change either. |
FYI: I have been investigating whether we can replace these uses of |
Thanks for the heads-up! I can imagine it won't be easy to make sure everything still plays nicely. |
Correct. This is one of the rare cases where we have to ignore the formatter errors. |
This could be unfounded paranoia on my side but I've generally tried to avoid loosing information about which lanes are active because I figure it may be useful later on. This prompts two thoughts, one easy the other more involved:
|
That makes sense, thanks! :)
I see what you mean, and I'm okay with either option. Do you have a preference? I guess option (2) sounds more scalable and gives us a chance to improve the lowering for |
Option (2) is my preference because once complete we can practically ignore |
I agree, although thinking about it a bit more, can we really canonicalise all |
Darn it! I forgot the unary operations don't use the same operand for data and passthrough :( Sorry for the misdirect. Let's stick with option (1) and pretend this never happened :) |
Aah, that makes sense! Thanks for confirming, I'll update the PR shortly 😄 |
auto *Mask = ConstantVector::getSplat( | ||
Ty->getElementCount(), | ||
ConstantInt::get(Ty->getElementType(), MaskValue)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto *Mask = ConstantVector::getSplat( | |
Ty->getElementCount(), | |
ConstantInt::get(Ty->getElementType(), MaskValue)); | |
auto *Mask = ConstantInt::get(Ty, MaskValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ups, thanks - done.
define <vscale x 2 x i64> @uxtb_z_64(<vscale x 2 x i64> %0) #0 { | ||
; CHECK-LABEL: define <vscale x 2 x i64> @uxtb_z_64( | ||
; CHECK-SAME: <vscale x 2 x i64> [[TMP0:%.*]]) #[[ATTR0:[0-9]+]] { | ||
; CHECK-NEXT: [[TMP2:%.*]] = call <vscale x 2 x i64> @llvm.aarch64.sve.and.u.nxv2i64(<vscale x 2 x i1> splat (i1 true), <vscale x 2 x i64> [[TMP0]], <vscale x 2 x i64> splat (i64 255)) | ||
; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP2]] | ||
; | ||
%2 = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> zeroinitializer, <vscale x 2 x i1> splat (i1 true), <vscale x 2 x i64> %0) | ||
ret <vscale x 2 x i64> %2 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do any of the _z_
tests add value? The new combine doesn't care about the passthrough value, only whether it is undef
or not.
The _z_
tests aside, it's up to you but I feel many of the other tests can also be removed. I think you only need to test a single element type for the core functionality (i.e. positive and negative tests relating to whether the combine is applied) and then a single test per element type to show the correct and_u
equivalent is created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that makes sense, I've cleaned up the tests a bit.
(I had auto-generated them from ACLE code without giving it proper thought.)
; CHECK-NEXT: [[TMP4:%.*]] = call <vscale x 2 x i64> @llvm.aarch64.sve.and.u.nxv2i64(<vscale x 2 x i1> [[TMP3]], <vscale x 2 x i64> [[TMP1]], <vscale x 2 x i64> splat (i64 255)) | ||
; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP4]] | ||
; | ||
%3 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can pass <vscale x 2 x i1>
directly as a parameter and then you'll not need the convert
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, sorted.
; CHECK-NEXT: [[TMP5:%.*]] = tail call <vscale x 2 x i64> @llvm.aarch64.sve.uxtb.nxv2i64(<vscale x 2 x i64> [[TMP2]], <vscale x 2 x i1> [[TMP4]], <vscale x 2 x i64> [[TMP1]]) | ||
; CHECK-NEXT: ret <vscale x 2 x i64> [[TMP5]] | ||
; | ||
%4 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above.
This patch combines uxt[bhw] intrinsics to and_u when the governing predicate is all-true or the passthrough is undef (e.g. in cases of ``unknown'' merging). This improves code gen as the latter can be emitted as AND immediate instructions. For example, given: ```cpp svuint64_t foo(svuint64_t x) { return svextb_z(svptrue_b64(), x); } ``` Currently: ```gas foo: ptrue p0.d movi v1.2d, #0000000000000000 uxtb z0.d, p0/m, z0.d ret ``` Becomes: ```gas foo: and z0.d, z0.d, #0xff ret ```
Currently, we lower UXT[BHW] intrinsics into the corresponding UXT*
instruction. However, when the governing predicate is all-true or the
passthrough is undef (e.g. in cases of ``unknown'' merging), we can
lower them into AND immediate instructions instead.
For example, given:
Currently:
Becomes:
I implemented this in InstCombine in case it unblocks other
simplifications, but I'm happy to move it elsewhere if needed.