[CodeGen][RISCV] Use vscale_range to handle more fixed<->scalable casts of i1 vectors. #138378

topperc · 2025-05-03T05:04:31Z

RISC-V with -mrvv-vector-bits-min supports giving a size to our scalable vector types. To do this, we represent the vector as a fixed vector in memory and need to cast back and force to scable vectors.

For i1 vectors, we use an i8 vector in memory. If there are less than 8 bits we use a <1 x i8> vector with some undefined bits.

The cast code previously fell back to copying through memory if the known minimum size of the scable i1 was not divisible by 8. This used a load or store from a fixed vector alloca. If X is less than 8, DataLayout indicates that the load/store reads/writes vscale bytes even if vscale is known and vscale*X is less than or equal to 8. This means the load or store is outside the bounds of the fixed size alloca as far as DataLayout is concerned leading to undefined behavior.

This patch makes use of the known value of vscale_range to avoid casting through memory.

Hopefully this allows #130973 to proceed.

…ts of i1 vectors. RISC-V with -mrvv-vector-bits-min supports giving a size to our scalable vector types. To do this, we represent the vector as a fixed vector in memory and need to cast back and force to scable vectors. For i1 vectors, we use an i8 vector in memory. If there are less than 8 bits we use a <1 x i8> vector with some undefined bits. The cast code previously fell back to copying through memory if the known minimum size of the scable i1 was not divisible by 8. This used a <vscale x X x i1> load or store from a fixed vector alloca. If X is less than 8, DataLayout indicates that the load/store reads/writes vscale bytes even if vscale is known and vscale*X is less than or equal to 8. This means the load or store is outside the bounds of the fixed size alloca as far as DataLayout is concerned leading to undefined behavior. This patch makes use of the known value of vscale_range to avoid casting through memory. Hopefully this allows llvm#130973 to proceed.

llvmbot · 2025-05-03T05:05:05Z

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Craig Topper (topperc)

Changes

RISC-V with -mrvv-vector-bits-min supports giving a size to our scalable vector types. To do this, we represent the vector as a fixed vector in memory and need to cast back and force to scable vectors.

For i1 vectors, we use an i8 vector in memory. If there are less than 8 bits we use a <1 x i8> vector with some undefined bits.

The cast code previously fell back to copying through memory if the known minimum size of the scable i1 was not divisible by 8. This used a <vscale x X x i1> load or store from a fixed vector alloca. If X is less than 8, DataLayout indicates that the load/store reads/writes vscale bytes even if vscale is known and vscale*X is less than or equal to 8. This means the load or store is outside the bounds of the fixed size alloca as far as DataLayout is concerned leading to undefined behavior.

This patch makes use of the known value of vscale_range to avoid casting through memory.

Hopefully this allows #130973 to proceed.

Patch is 41.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/138378.diff

8 Files Affected:

(modified) clang/lib/CodeGen/CGCall.cpp (+55)
(modified) clang/lib/CodeGen/CGExprScalar.cpp (+58)
(modified) clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c (+16-88)
(modified) clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-cast.c (+10-46)
(modified) clang/test/CodeGen/RISCV/attr-rvv-vector-bits-bitcast-less-8.c (+14-18)
(modified) clang/test/CodeGen/RISCV/attr-rvv-vector-bits-cast.c (+6-12)
(modified) clang/test/CodeGen/RISCV/attr-rvv-vector-bits-codegen.c (+16-17)
(modified) clang/test/CodeGen/RISCV/attr-rvv-vector-bits-globals.c (+4-8)

diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 82a24f7c295a2..2b8f46bad2ffe 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -1378,6 +1378,35 @@ static llvm::Value *CreateCoercedLoad(Address Src, llvm::Type *Ty,
           Result = CGF.Builder.CreateBitCast(Result, Ty);
         return Result;
       }
+
+      // If we are casting a fixed i8 vector to a scalable i1 predicate
+      // vector, and we weren't able to handle it above, try using what we know
+      // about vscale to insert a fixed i1 vector into the scalable vector.
+      if (ScalableDstTy->getElementType()->isIntegerTy(1) &&
+          FixedSrcTy->getElementType()->isIntegerTy(8)) {
+        std::optional<std::pair<unsigned, unsigned>> VScaleRange =
+            CGF.getContext().getTargetInfo().getVScaleRange(CGF.getLangOpts(),
+                                                            false);
+        if (VScaleRange && VScaleRange->first == VScaleRange->second &&
+            VScaleRange->first <= FixedSrcTy->getNumElements() * 8) {
+          llvm::Value *Load = CGF.Builder.CreateLoad(Src);
+          unsigned VScale = VScaleRange->first;
+          llvm::Type *WideFixedTy =
+              llvm::FixedVectorType::get(ScalableDstTy->getElementType(),
+                                         FixedSrcTy->getNumElements() * 8);
+          Load = CGF.Builder.CreateBitCast(Load, WideFixedTy);
+          llvm::Type *FixedTy = llvm::FixedVectorType::get(
+              ScalableDstTy->getElementType(),
+              ScalableDstTy->getElementCount().getKnownMinValue() * VScale);
+          // If the fixed i8 vector is larger than the i1 vector, we need to
+          // extract.
+          if (FixedTy != WideFixedTy)
+            Load = CGF.Builder.CreateExtractVector(FixedTy, Load, uint64_t(0));
+          return CGF.Builder.CreateInsertVector(
+              ScalableDstTy, llvm::PoisonValue::get(ScalableDstTy), Load,
+              uint64_t(0));
+        }
+      }
     }
   }
 
@@ -1485,6 +1514,32 @@ CoerceScalableToFixed(CodeGenFunction &CGF, llvm::FixedVectorType *ToTy,
     V = CGF.Builder.CreateExtractVector(ToTy, V, uint64_t(0), "cast.fixed");
     return {V, true};
   }
+
+  // If we are casting a scalable i1 predicate vector to a fixed i8
+  // vector, and we weren't able to handle it above, try using what we know
+  // about vscale to extract a fixed i1 vector from the scalable vector.
+  if (FromTy->getElementType()->isIntegerTy(1) &&
+      ToTy->getElementType() == CGF.Builder.getInt8Ty()) {
+    std::optional<std::pair<unsigned, unsigned>> VScaleRange =
+        CGF.getContext().getTargetInfo().getVScaleRange(CGF.getLangOpts(),
+                                                        false);
+    if (VScaleRange && VScaleRange->first == VScaleRange->second &&
+        VScaleRange->first <= ToTy->getNumElements() * 8) {
+      unsigned VScale = VScaleRange->first;
+      llvm::Type *FixedTy = llvm::FixedVectorType::get(
+          FromTy->getElementType(),
+          FromTy->getElementCount().getKnownMinValue() * VScale);
+      V = CGF.Builder.CreateExtractVector(FixedTy, V, uint64_t(0));
+      llvm::Type *WideFixedTy = llvm::FixedVectorType::get(
+          FromTy->getElementType(), ToTy->getNumElements() * 8);
+      if (FixedTy != WideFixedTy)
+        V = CGF.Builder.CreateInsertVector(
+            WideFixedTy, llvm::PoisonValue::get(WideFixedTy), V, uint64_t(0));
+      V = CGF.Builder.CreateBitCast(V, ToTy);
+      return {V, true};
+    }
+  }
+
   return {V, false};
 }
 
diff --git a/clang/lib/CodeGen/CGExprScalar.cpp b/clang/lib/CodeGen/CGExprScalar.cpp
index 15a6177746403..5db74628a1743 100644
--- a/clang/lib/CodeGen/CGExprScalar.cpp
+++ b/clang/lib/CodeGen/CGExprScalar.cpp
@@ -2493,6 +2493,35 @@ Value *ScalarExprEmitter::VisitCastExpr(CastExpr *CE) {
             Result = Builder.CreateBitCast(Result, DstTy);
           return Result;
         }
+
+        // If we are casting a fixed i8 vector to a scalable i1 predicate
+        // vector, and we weren't able to handle it above, try using what we
+        // know about vscale to insert a fixed i1 vector into the scalable
+        // vector.
+        if (ScalableDstTy->getElementType()->isIntegerTy(1) &&
+            FixedSrcTy->getElementType()->isIntegerTy(8)) {
+          std::optional<std::pair<unsigned, unsigned>> VScaleRange =
+              CGF.getContext().getTargetInfo().getVScaleRange(CGF.getLangOpts(),
+                                                              false);
+          if (VScaleRange && VScaleRange->first == VScaleRange->second &&
+              VScaleRange->first <= FixedSrcTy->getNumElements() * 8) {
+            unsigned VScale = VScaleRange->first;
+            llvm::Type *WideFixedTy =
+                llvm::FixedVectorType::get(ScalableDstTy->getElementType(),
+                                           FixedSrcTy->getNumElements() * 8);
+            Src = Builder.CreateBitCast(Src, WideFixedTy);
+            llvm::Type *FixedTy = llvm::FixedVectorType::get(
+                ScalableDstTy->getElementType(),
+                ScalableDstTy->getElementCount().getKnownMinValue() * VScale);
+            // If the fixed i8 vector is larger than the i1 vector, we need to
+            // extract.
+            if (FixedTy != WideFixedTy)
+              Src = Builder.CreateExtractVector(FixedTy, Src, uint64_t(0));
+            return Builder.CreateInsertVector(
+                ScalableDstTy, llvm::PoisonValue::get(ScalableDstTy), Src,
+                uint64_t(0));
+          }
+        }
       }
     }
 
@@ -2514,6 +2543,35 @@ Value *ScalarExprEmitter::VisitCastExpr(CastExpr *CE) {
         if (ScalableSrcTy->getElementType() == FixedDstTy->getElementType())
           return Builder.CreateExtractVector(DstTy, Src, uint64_t(0),
                                              "cast.fixed");
+
+        // If we are casting a scalable i1 predicate vector to a fixed i8
+        // vector, and we weren't able to handle it above, try using what we
+        // know about vscale to extract a fixed i1 vector from the scalable
+        // vector.
+        if (ScalableSrcTy->getElementType()->isIntegerTy(1) &&
+            FixedDstTy->getElementType()->isIntegerTy(8)) {
+          std::optional<std::pair<unsigned, unsigned>> VScaleRange =
+              CGF.getContext().getTargetInfo().getVScaleRange(CGF.getLangOpts(),
+                                                              false);
+          if (VScaleRange && VScaleRange->first == VScaleRange->second &&
+              VScaleRange->first <= FixedDstTy->getNumElements() * 8) {
+            unsigned VScale = VScaleRange->first;
+            llvm::Type *FixedTy = llvm::FixedVectorType::get(
+                ScalableSrcTy->getElementType(),
+                ScalableSrcTy->getElementCount().getKnownMinValue() * VScale);
+            Src = Builder.CreateExtractVector(FixedTy, Src, uint64_t(0));
+            llvm::Type *WideFixedTy =
+                llvm::FixedVectorType::get(ScalableSrcTy->getElementType(),
+                                           FixedDstTy->getNumElements() * 8);
+            // If the fixed i8 vector is larger than the i1 vector, we need to
+            // widen the i1 vector.
+            if (FixedTy != WideFixedTy)
+              Src = Builder.CreateInsertVector(
+                  WideFixedTy, llvm::PoisonValue::get(WideFixedTy), Src,
+                  uint64_t(0));
+            return Builder.CreateBitCast(Src, FixedDstTy);
+          }
+        }
       }
     }
 
diff --git a/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c b/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c
index e2f02dc64f766..3ab065d34bcfb 100644
--- a/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c
+++ b/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c
@@ -15,24 +15,12 @@ typedef vbool64_t fixed_bool64_t __attribute__((riscv_rvv_vector_bits(__riscv_v_
 
 // CHECK-64-LABEL: @call_bool32_ff(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE4:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2_COERCE:%.*]], i64 2)
-// CHECK-64-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA6:![0-9]+]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10:![0-9]+]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-64-NEXT:    [[TMP2:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[TMP1:%.*]], i64 2)
 // CHECK-64-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
 //
 // CHECK-128-LABEL: @call_bool32_ff(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE4:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2_COERCE:%.*]], i64 4)
-// CHECK-128-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA6:![0-9]+]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10:![0-9]+]]
-// CHECK-128-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-128-NEXT:    [[TMP2:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[TMP1:%.*]], i64 4)
 // CHECK-128-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
 //
 fixed_bool32_t call_bool32_ff(fixed_bool32_t op1, fixed_bool32_t op2) {
@@ -41,24 +29,12 @@ fixed_bool32_t call_bool32_ff(fixed_bool32_t op1, fixed_bool32_t op2) {
 
 // CHECK-64-LABEL: @call_bool64_ff(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE4:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2_COERCE:%.*]], i64 1)
-// CHECK-64-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA11:![0-9]+]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-64-NEXT:    [[TMP2:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[TMP1:%.*]], i64 1)
 // CHECK-64-NEXT:    ret <vscale x 1 x i1> [[TMP2]]
 //
 // CHECK-128-LABEL: @call_bool64_ff(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE4:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2_COERCE:%.*]], i64 2)
-// CHECK-128-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA11:![0-9]+]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-128-NEXT:    [[TMP2:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[TMP1:%.*]], i64 2)
 // CHECK-128-NEXT:    ret <vscale x 1 x i1> [[TMP2]]
 //
 fixed_bool64_t call_bool64_ff(fixed_bool64_t op1, fixed_bool64_t op2) {
@@ -71,25 +47,13 @@ fixed_bool64_t call_bool64_ff(fixed_bool64_t op1, fixed_bool64_t op2) {
 
 // CHECK-64-LABEL: @call_bool32_fs(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE2:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 2)
-// CHECK-64-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA6]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-64-NEXT:    [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 2)
+// CHECK-64-NEXT:    ret <vscale x 2 x i1> [[TMP1]]
 //
 // CHECK-128-LABEL: @call_bool32_fs(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE2:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 4)
-// CHECK-128-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA6]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-128-NEXT:    [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 4)
+// CHECK-128-NEXT:    ret <vscale x 2 x i1> [[TMP1]]
 //
 fixed_bool32_t call_bool32_fs(fixed_bool32_t op1, vbool32_t op2) {
   return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 32);
@@ -97,25 +61,13 @@ fixed_bool32_t call_bool32_fs(fixed_bool32_t op1, vbool32_t op2) {
 
 // CHECK-64-LABEL: @call_bool64_fs(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE2:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 1)
-// CHECK-64-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA11]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    ret <vscale x 1 x i1> [[TMP2]]
+// CHECK-64-NEXT:    [[TMP1:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 1)
+// CHECK-64-NEXT:    ret <vscale x 1 x i1> [[TMP1]]
 //
 // CHECK-128-LABEL: @call_bool64_fs(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE2:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 2)
-// CHECK-128-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA11]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    ret <vscale x 1 x i1> [[TMP2]]
+// CHECK-128-NEXT:    [[TMP1:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 2)
+// CHECK-128-NEXT:    ret <vscale x 1 x i1> [[TMP1]]
 //
 fixed_bool64_t call_bool64_fs(fixed_bool64_t op1, vbool64_t op2) {
   return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 64);
@@ -127,25 +79,13 @@ fixed_bool64_t call_bool64_fs(fixed_bool64_t op1, vbool64_t op2) {
 
 // CHECK-64-LABEL: @call_bool32_ss(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
 // CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 2)
-// CHECK-64-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA6]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-64-NEXT:    ret <vscale x 2 x i1> [[TMP0]]
 //
 // CHECK-128-LABEL: @call_bool32_ss(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
 // CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 4)
-// CHECK-128-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA6]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-128-NEXT:    ret <vscale x 2 x i1> [[TMP0]]
 //
 fixed_bool32_t call_bool32_ss(vbool32_t op1, vbool32_t op2) {
   return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 32);
@@ -153,25 +93,13 @@ fixed_bool32_t call_bool32_ss(vbool32_t op1, vbool32_t op2) {
 
 // CHECK-64-LABEL: @call_bool64_ss(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
 // CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 1)
-// CHECK-64-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA11]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    ret <vscale x 1 x i1> [[TMP2]]
+// CHECK-64-NEXT:    ret <vscale x 1 x i1> [[TMP0]]
 //
 // CHECK-128-LABEL: @call_bool64_ss(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
 // CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 2)
-// CHECK-128-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA11]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NE...
[truncated]

llvmbot · 2025-05-03T05:05:06Z

@llvm/pr-subscribers-backend-risc-v

Author: Craig Topper (topperc)

Changes

RISC-V with -mrvv-vector-bits-min supports giving a size to our scalable vector types. To do this, we represent the vector as a fixed vector in memory and need to cast back and force to scable vectors.

For i1 vectors, we use an i8 vector in memory. If there are less than 8 bits we use a <1 x i8> vector with some undefined bits.

The cast code previously fell back to copying through memory if the known minimum size of the scable i1 was not divisible by 8. This used a <vscale x X x i1> load or store from a fixed vector alloca. If X is less than 8, DataLayout indicates that the load/store reads/writes vscale bytes even if vscale is known and vscale*X is less than or equal to 8. This means the load or store is outside the bounds of the fixed size alloca as far as DataLayout is concerned leading to undefined behavior.

This patch makes use of the known value of vscale_range to avoid casting through memory.

Hopefully this allows #130973 to proceed.

Patch is 41.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/138378.diff

8 Files Affected:

(modified) clang/lib/CodeGen/CGCall.cpp (+55)
(modified) clang/lib/CodeGen/CGExprScalar.cpp (+58)
(modified) clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c (+16-88)
(modified) clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-cast.c (+10-46)
(modified) clang/test/CodeGen/RISCV/attr-rvv-vector-bits-bitcast-less-8.c (+14-18)
(modified) clang/test/CodeGen/RISCV/attr-rvv-vector-bits-cast.c (+6-12)
(modified) clang/test/CodeGen/RISCV/attr-rvv-vector-bits-codegen.c (+16-17)
(modified) clang/test/CodeGen/RISCV/attr-rvv-vector-bits-globals.c (+4-8)

diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 82a24f7c295a2..2b8f46bad2ffe 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -1378,6 +1378,35 @@ static llvm::Value *CreateCoercedLoad(Address Src, llvm::Type *Ty,
           Result = CGF.Builder.CreateBitCast(Result, Ty);
         return Result;
       }
+
+      // If we are casting a fixed i8 vector to a scalable i1 predicate
+      // vector, and we weren't able to handle it above, try using what we know
+      // about vscale to insert a fixed i1 vector into the scalable vector.
+      if (ScalableDstTy->getElementType()->isIntegerTy(1) &&
+          FixedSrcTy->getElementType()->isIntegerTy(8)) {
+        std::optional<std::pair<unsigned, unsigned>> VScaleRange =
+            CGF.getContext().getTargetInfo().getVScaleRange(CGF.getLangOpts(),
+                                                            false);
+        if (VScaleRange && VScaleRange->first == VScaleRange->second &&
+            VScaleRange->first <= FixedSrcTy->getNumElements() * 8) {
+          llvm::Value *Load = CGF.Builder.CreateLoad(Src);
+          unsigned VScale = VScaleRange->first;
+          llvm::Type *WideFixedTy =
+              llvm::FixedVectorType::get(ScalableDstTy->getElementType(),
+                                         FixedSrcTy->getNumElements() * 8);
+          Load = CGF.Builder.CreateBitCast(Load, WideFixedTy);
+          llvm::Type *FixedTy = llvm::FixedVectorType::get(
+              ScalableDstTy->getElementType(),
+              ScalableDstTy->getElementCount().getKnownMinValue() * VScale);
+          // If the fixed i8 vector is larger than the i1 vector, we need to
+          // extract.
+          if (FixedTy != WideFixedTy)
+            Load = CGF.Builder.CreateExtractVector(FixedTy, Load, uint64_t(0));
+          return CGF.Builder.CreateInsertVector(
+              ScalableDstTy, llvm::PoisonValue::get(ScalableDstTy), Load,
+              uint64_t(0));
+        }
+      }
     }
   }
 
@@ -1485,6 +1514,32 @@ CoerceScalableToFixed(CodeGenFunction &CGF, llvm::FixedVectorType *ToTy,
     V = CGF.Builder.CreateExtractVector(ToTy, V, uint64_t(0), "cast.fixed");
     return {V, true};
   }
+
+  // If we are casting a scalable i1 predicate vector to a fixed i8
+  // vector, and we weren't able to handle it above, try using what we know
+  // about vscale to extract a fixed i1 vector from the scalable vector.
+  if (FromTy->getElementType()->isIntegerTy(1) &&
+      ToTy->getElementType() == CGF.Builder.getInt8Ty()) {
+    std::optional<std::pair<unsigned, unsigned>> VScaleRange =
+        CGF.getContext().getTargetInfo().getVScaleRange(CGF.getLangOpts(),
+                                                        false);
+    if (VScaleRange && VScaleRange->first == VScaleRange->second &&
+        VScaleRange->first <= ToTy->getNumElements() * 8) {
+      unsigned VScale = VScaleRange->first;
+      llvm::Type *FixedTy = llvm::FixedVectorType::get(
+          FromTy->getElementType(),
+          FromTy->getElementCount().getKnownMinValue() * VScale);
+      V = CGF.Builder.CreateExtractVector(FixedTy, V, uint64_t(0));
+      llvm::Type *WideFixedTy = llvm::FixedVectorType::get(
+          FromTy->getElementType(), ToTy->getNumElements() * 8);
+      if (FixedTy != WideFixedTy)
+        V = CGF.Builder.CreateInsertVector(
+            WideFixedTy, llvm::PoisonValue::get(WideFixedTy), V, uint64_t(0));
+      V = CGF.Builder.CreateBitCast(V, ToTy);
+      return {V, true};
+    }
+  }
+
   return {V, false};
 }
 
diff --git a/clang/lib/CodeGen/CGExprScalar.cpp b/clang/lib/CodeGen/CGExprScalar.cpp
index 15a6177746403..5db74628a1743 100644
--- a/clang/lib/CodeGen/CGExprScalar.cpp
+++ b/clang/lib/CodeGen/CGExprScalar.cpp
@@ -2493,6 +2493,35 @@ Value *ScalarExprEmitter::VisitCastExpr(CastExpr *CE) {
             Result = Builder.CreateBitCast(Result, DstTy);
           return Result;
         }
+
+        // If we are casting a fixed i8 vector to a scalable i1 predicate
+        // vector, and we weren't able to handle it above, try using what we
+        // know about vscale to insert a fixed i1 vector into the scalable
+        // vector.
+        if (ScalableDstTy->getElementType()->isIntegerTy(1) &&
+            FixedSrcTy->getElementType()->isIntegerTy(8)) {
+          std::optional<std::pair<unsigned, unsigned>> VScaleRange =
+              CGF.getContext().getTargetInfo().getVScaleRange(CGF.getLangOpts(),
+                                                              false);
+          if (VScaleRange && VScaleRange->first == VScaleRange->second &&
+              VScaleRange->first <= FixedSrcTy->getNumElements() * 8) {
+            unsigned VScale = VScaleRange->first;
+            llvm::Type *WideFixedTy =
+                llvm::FixedVectorType::get(ScalableDstTy->getElementType(),
+                                           FixedSrcTy->getNumElements() * 8);
+            Src = Builder.CreateBitCast(Src, WideFixedTy);
+            llvm::Type *FixedTy = llvm::FixedVectorType::get(
+                ScalableDstTy->getElementType(),
+                ScalableDstTy->getElementCount().getKnownMinValue() * VScale);
+            // If the fixed i8 vector is larger than the i1 vector, we need to
+            // extract.
+            if (FixedTy != WideFixedTy)
+              Src = Builder.CreateExtractVector(FixedTy, Src, uint64_t(0));
+            return Builder.CreateInsertVector(
+                ScalableDstTy, llvm::PoisonValue::get(ScalableDstTy), Src,
+                uint64_t(0));
+          }
+        }
       }
     }
 
@@ -2514,6 +2543,35 @@ Value *ScalarExprEmitter::VisitCastExpr(CastExpr *CE) {
         if (ScalableSrcTy->getElementType() == FixedDstTy->getElementType())
           return Builder.CreateExtractVector(DstTy, Src, uint64_t(0),
                                              "cast.fixed");
+
+        // If we are casting a scalable i1 predicate vector to a fixed i8
+        // vector, and we weren't able to handle it above, try using what we
+        // know about vscale to extract a fixed i1 vector from the scalable
+        // vector.
+        if (ScalableSrcTy->getElementType()->isIntegerTy(1) &&
+            FixedDstTy->getElementType()->isIntegerTy(8)) {
+          std::optional<std::pair<unsigned, unsigned>> VScaleRange =
+              CGF.getContext().getTargetInfo().getVScaleRange(CGF.getLangOpts(),
+                                                              false);
+          if (VScaleRange && VScaleRange->first == VScaleRange->second &&
+              VScaleRange->first <= FixedDstTy->getNumElements() * 8) {
+            unsigned VScale = VScaleRange->first;
+            llvm::Type *FixedTy = llvm::FixedVectorType::get(
+                ScalableSrcTy->getElementType(),
+                ScalableSrcTy->getElementCount().getKnownMinValue() * VScale);
+            Src = Builder.CreateExtractVector(FixedTy, Src, uint64_t(0));
+            llvm::Type *WideFixedTy =
+                llvm::FixedVectorType::get(ScalableSrcTy->getElementType(),
+                                           FixedDstTy->getNumElements() * 8);
+            // If the fixed i8 vector is larger than the i1 vector, we need to
+            // widen the i1 vector.
+            if (FixedTy != WideFixedTy)
+              Src = Builder.CreateInsertVector(
+                  WideFixedTy, llvm::PoisonValue::get(WideFixedTy), Src,
+                  uint64_t(0));
+            return Builder.CreateBitCast(Src, FixedDstTy);
+          }
+        }
       }
     }
 
diff --git a/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c b/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c
index e2f02dc64f766..3ab065d34bcfb 100644
--- a/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c
+++ b/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c
@@ -15,24 +15,12 @@ typedef vbool64_t fixed_bool64_t __attribute__((riscv_rvv_vector_bits(__riscv_v_
 
 // CHECK-64-LABEL: @call_bool32_ff(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE4:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2_COERCE:%.*]], i64 2)
-// CHECK-64-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA6:![0-9]+]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10:![0-9]+]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-64-NEXT:    [[TMP2:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[TMP1:%.*]], i64 2)
 // CHECK-64-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
 //
 // CHECK-128-LABEL: @call_bool32_ff(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE4:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2_COERCE:%.*]], i64 4)
-// CHECK-128-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA6:![0-9]+]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10:![0-9]+]]
-// CHECK-128-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-128-NEXT:    [[TMP2:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[TMP1:%.*]], i64 4)
 // CHECK-128-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
 //
 fixed_bool32_t call_bool32_ff(fixed_bool32_t op1, fixed_bool32_t op2) {
@@ -41,24 +29,12 @@ fixed_bool32_t call_bool32_ff(fixed_bool32_t op1, fixed_bool32_t op2) {
 
 // CHECK-64-LABEL: @call_bool64_ff(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE4:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2_COERCE:%.*]], i64 1)
-// CHECK-64-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA11:![0-9]+]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-64-NEXT:    [[TMP2:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[TMP1:%.*]], i64 1)
 // CHECK-64-NEXT:    ret <vscale x 1 x i1> [[TMP2]]
 //
 // CHECK-128-LABEL: @call_bool64_ff(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE4:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2_COERCE:%.*]], i64 2)
-// CHECK-128-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA11:![0-9]+]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-128-NEXT:    [[TMP2:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[TMP1:%.*]], i64 2)
 // CHECK-128-NEXT:    ret <vscale x 1 x i1> [[TMP2]]
 //
 fixed_bool64_t call_bool64_ff(fixed_bool64_t op1, fixed_bool64_t op2) {
@@ -71,25 +47,13 @@ fixed_bool64_t call_bool64_ff(fixed_bool64_t op1, fixed_bool64_t op2) {
 
 // CHECK-64-LABEL: @call_bool32_fs(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE2:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 2)
-// CHECK-64-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA6]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-64-NEXT:    [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 2)
+// CHECK-64-NEXT:    ret <vscale x 2 x i1> [[TMP1]]
 //
 // CHECK-128-LABEL: @call_bool32_fs(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE2:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 4)
-// CHECK-128-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA6]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-128-NEXT:    [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 4)
+// CHECK-128-NEXT:    ret <vscale x 2 x i1> [[TMP1]]
 //
 fixed_bool32_t call_bool32_fs(fixed_bool32_t op1, vbool32_t op2) {
   return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 32);
@@ -97,25 +61,13 @@ fixed_bool32_t call_bool32_fs(fixed_bool32_t op1, vbool32_t op2) {
 
 // CHECK-64-LABEL: @call_bool64_fs(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE2:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 1)
-// CHECK-64-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA11]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    ret <vscale x 1 x i1> [[TMP2]]
+// CHECK-64-NEXT:    [[TMP1:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 1)
+// CHECK-64-NEXT:    ret <vscale x 1 x i1> [[TMP1]]
 //
 // CHECK-128-LABEL: @call_bool64_fs(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE2:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 2)
-// CHECK-128-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA11]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    ret <vscale x 1 x i1> [[TMP2]]
+// CHECK-128-NEXT:    [[TMP1:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 2)
+// CHECK-128-NEXT:    ret <vscale x 1 x i1> [[TMP1]]
 //
 fixed_bool64_t call_bool64_fs(fixed_bool64_t op1, vbool64_t op2) {
   return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 64);
@@ -127,25 +79,13 @@ fixed_bool64_t call_bool64_fs(fixed_bool64_t op1, vbool64_t op2) {
 
 // CHECK-64-LABEL: @call_bool32_ss(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
 // CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 2)
-// CHECK-64-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA6]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-64-NEXT:    ret <vscale x 2 x i1> [[TMP0]]
 //
 // CHECK-128-LABEL: @call_bool32_ss(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
 // CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 4)
-// CHECK-128-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA6]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-128-NEXT:    ret <vscale x 2 x i1> [[TMP0]]
 //
 fixed_bool32_t call_bool32_ss(vbool32_t op1, vbool32_t op2) {
   return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 32);
@@ -153,25 +93,13 @@ fixed_bool32_t call_bool32_ss(vbool32_t op1, vbool32_t op2) {
 
 // CHECK-64-LABEL: @call_bool64_ss(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
 // CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 1)
-// CHECK-64-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA11]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    ret <vscale x 1 x i1> [[TMP2]]
+// CHECK-64-NEXT:    ret <vscale x 1 x i1> [[TMP0]]
 //
 // CHECK-128-LABEL: @call_bool64_ss(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
 // CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 2)
-// CHECK-128-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA11]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NE...
[truncated]

topperc · 2025-05-05T17:05:03Z

I have a different idea I want to try. Moving to draft.

topperc requested review from nikic and paulwalker-arm May 3, 2025 05:04

llvmbot added clang Clang issues not falling into any other category backend:RISC-V clang:codegen IR generation bugs: mangling, exceptions, etc. labels May 3, 2025

topperc marked this pull request as draft May 5, 2025 17:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CodeGen][RISCV] Use vscale_range to handle more fixed<->scalable casts of i1 vectors. #138378

[CodeGen][RISCV] Use vscale_range to handle more fixed<->scalable casts of i1 vectors. #138378

topperc commented May 3, 2025

llvmbot commented May 3, 2025 •

edited

Loading

llvmbot commented May 3, 2025

topperc commented May 5, 2025

[CodeGen][RISCV] Use vscale_range to handle more fixed<->scalable casts of i1 vectors. #138378

Are you sure you want to change the base?

[CodeGen][RISCV] Use vscale_range to handle more fixed<->scalable casts of i1 vectors. #138378

Conversation

topperc commented May 3, 2025

llvmbot commented May 3, 2025 • edited Loading

llvmbot commented May 3, 2025

topperc commented May 5, 2025

llvmbot commented May 3, 2025 •

edited

Loading