[RISCV] Add scheduler definitions for SpacemiT-X60 #137343

mikhailramalho · 2025-04-25T14:53:20Z

This patch adds an initial scheduler model for the SpacemiT-X60, including latency for scalar instructions only.

The scheduler is based on the documented characteristics of the C908, which the SpacemiT-X60 is believed to be based on, and provides the expected latency for several instructions. I ran a probe to confirm all of these values and to get the latency of instructions not provided by the C908 documentation (e.g., double floating-point instructions).

For load and store instructions, the C908 documentation says the latency is >= 3 for load and 1 for store. I tried a few combinations of values until I got the current values of 5 and 3, which yield the best results.

Although the X60 does appear to support multiple issue for at least some floating point instructions, this model assumes single issue as increasing it reduces the gains below.

This patch gives a geomean improvement of ~4% on SPEC CPU 2017 for both rva22u64 and rva22u64_v, with some benchmarks improving up to 18% (508.namd_r). There were a couple of execution time regressions, but only in noisy benchmarks (523.xalancbmk_r and 510.parest_r).

rva22u64: https://lnt.lukelau.me/db_default/v4/nts/507?compare_to=405 (compares a55f727 to the baseline 8286b80)
rva22u64_v: https://lnt.lukelau.me/db_default/v4/nts/474?compare_to=404 (compares a55f727 to the baseline 8286b80)

This initial scheduling model is strongly focused on providing sufficient definitions to provide improved performance for the SpacemiT-X60. Further incremental gains may be possible through a much more detailed microarchitectural analysis, but that is left to future work.

Further scheduling definitions for RVV can be added in a future PR.

llvmbot · 2025-04-25T14:53:59Z

@llvm/pr-subscribers-backend-risc-v

Author: Mikhail R. Gadelha (mikhailramalho)

Changes

This patch adds an initial scheduler model for the SpacemiT-X60, including latency for scalar instructions only.

The scheduler is based on the documented characteristics of the C908, which the SpacemiT-X60 is believed to be based on, and provides the expected latency for several instructions. I ran llvm-exegesis to confirm most of these values and to get the latency of instructions not provided by the C908 documentation (e.g., double floating-point instructions).

For load and store instructions, the C908 documentation says the latency is >= 3 for load and 1 for store. I tried a few combinations of values until I got the current values of 5 and 3, which yield the best results.

Although the X60 does appear to support multiple issue for at least some floating point instructions, this model assumes single issue as increasing it reduces the gains below.

This patch gives a geomean improvement of ~4% on SPEC CPU 2017 for both rva22u64 and rva22u64_v, with some benchmarks improving up to 15% (525.x264_r, 508.namd_r). There were no execution time regressions detected.

rva22u64: https://lnt.lukelau.me/db_default/v4/nts/471?compare_to=405 (compares a commit functionally identical to 66afbfd to the baseline 8286b80)
rva22u64_v: https://lnt.lukelau.me/db_default/v4/nts/474?compare_to=404 (compares a commit functionally identical to 66afbfd to the baseline 8286b80)

This initial scheduling model is strongly focused on providing sufficient definitions to provide improved performance for the SpacemiT-X60. Further incremental gains may be possible through a much more detailed microarchitectural analysis, but that is left to future work.

Further scheduling definitions for RVV can be added in a future PR.

Patch is 68.08 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/137343.diff

7 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCV.td (+1)
(modified) llvm/lib/Target/RISCV/RISCVProcessors.td (+1-1)
(added) llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td (+332)
(modified) llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll (+25-25)
(added) llvm/test/tools/llvm-mca/RISCV/SpacemitX60/atomic.s (+312)
(added) llvm/test/tools/llvm-mca/RISCV/SpacemitX60/floating-point.s (+334)
(added) llvm/test/tools/llvm-mca/RISCV/SpacemitX60/integer.s (+420)

diff --git a/llvm/lib/Target/RISCV/RISCV.td b/llvm/lib/Target/RISCV/RISCV.td
index 2c2271e486a84..6a6cec88b74a4 100644
--- a/llvm/lib/Target/RISCV/RISCV.td
+++ b/llvm/lib/Target/RISCV/RISCV.td
@@ -57,6 +57,7 @@ include "RISCVSchedSyntacoreSCR345.td"
 include "RISCVSchedSyntacoreSCR7.td"
 include "RISCVSchedTTAscalonD8.td"
 include "RISCVSchedXiangShanNanHu.td"
+include "RISCVSchedSpacemitX60.td"
 
 //===----------------------------------------------------------------------===//
 // RISC-V processors supported.
diff --git a/llvm/lib/Target/RISCV/RISCVProcessors.td b/llvm/lib/Target/RISCV/RISCVProcessors.td
index 9d48adeec5e86..6e44518cb43f2 100644
--- a/llvm/lib/Target/RISCV/RISCVProcessors.td
+++ b/llvm/lib/Target/RISCV/RISCVProcessors.td
@@ -559,7 +559,7 @@ def XIANGSHAN_NANHU : RISCVProcessorModel<"xiangshan-nanhu",
                                             TuneShiftedZExtWFusion]>;
 
 def SPACEMIT_X60 : RISCVProcessorModel<"spacemit-x60",
-                                       NoSchedModel,
+                                       SpacemitX60Model,
                                        !listconcat(RVA22S64Features,
                                        [FeatureStdExtV,
                                         FeatureStdExtSscofpmf,
diff --git a/llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td b/llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td
new file mode 100644
index 0000000000000..d1148cc2f69dc
--- /dev/null
+++ b/llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td
@@ -0,0 +1,332 @@
+//=- RISCVSchedSpacemitX60.td - Spacemit X60 Scheduling Defs -*- tablegen -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+//===----------------------------------------------------------------------===//
+//
+// Scheduler model for the SpacemiT-X60 processor based on documentation of the
+// C908 and experiments on real hardware (bpi-f3).
+//
+//===----------------------------------------------------------------------===//
+
+def SpacemitX60Model : SchedMachineModel {
+  let IssueWidth        = 2; // dual-issue
+  let MicroOpBufferSize = 0; // in-order
+  let LoadLatency       = 5; // worse case: >= 3
+  let MispredictPenalty = 9; // nine-stage
+
+  let CompleteModel = 0;
+
+  let UnsupportedFeatures = [HasStdExtZknd, HasStdExtZkne, HasStdExtZknh,
+                             HasStdExtZksed, HasStdExtZksh, HasStdExtZkr];
+}
+
+let SchedModel = SpacemitX60Model in {
+
+//===----------------------------------------------------------------------===//
+// Define processor resources for Spacemit-X60
+
+// Information gathered from the C908 user manual:
+let BufferSize = 0 in {
+  // The LSU supports dual issue for scalar store/load instructions
+  def SMX60_LS : ProcResource<2>;
+
+  // An IEU can decode and issue two instructions at the same time
+  def SMX60_IEU : ProcResource<2>;
+
+  def SMX60_FP : ProcResource<1>;
+}
+
+//===----------------------------------------------------------------------===//
+
+// Branching
+def : WriteRes<WriteJmp, [SMX60_IEU]>;
+def : WriteRes<WriteJal, [SMX60_IEU]>;
+def : WriteRes<WriteJalr, [SMX60_IEU]>;
+
+// Integer arithmetic and logic
+def : WriteRes<WriteIALU32, [SMX60_IEU]>;
+def : WriteRes<WriteIALU, [SMX60_IEU]>;
+def : WriteRes<WriteShiftImm32, [SMX60_IEU]>;
+def : WriteRes<WriteShiftImm, [SMX60_IEU]>;
+def : WriteRes<WriteShiftReg32, [SMX60_IEU]>;
+def : WriteRes<WriteShiftReg, [SMX60_IEU]>;
+
+// Integer multiplication
+let Latency = 4 in {
+  def : WriteRes<WriteIMul, [SMX60_IEU]>;
+  def : WriteRes<WriteIMul32, [SMX60_IEU]>;
+}
+
+// Integer division/remainder
+// Worst case latency is used.
+def : WriteRes<WriteIDiv32, [SMX60_IEU]> { let Latency = 12; }
+def : WriteRes<WriteIDiv, [SMX60_IEU]> { let Latency = 20; }
+def : WriteRes<WriteIRem32, [SMX60_IEU]> { let Latency = 12; }
+def : WriteRes<WriteIRem, [SMX60_IEU]> { let Latency = 20; }
+
+// Bitmanip
+def : WriteRes<WriteRotateImm, [SMX60_IEU]>;
+def : WriteRes<WriteRotateImm32, [SMX60_IEU]>;
+def : WriteRes<WriteRotateReg, [SMX60_IEU]>;
+def : WriteRes<WriteRotateReg32, [SMX60_IEU]>;
+
+def : WriteRes<WriteCLZ, [SMX60_IEU]>;
+def : WriteRes<WriteCLZ32, [SMX60_IEU]>;
+def : WriteRes<WriteCTZ, [SMX60_IEU]>;
+def : WriteRes<WriteCTZ32, [SMX60_IEU]>;
+
+def : WriteRes<WriteCPOP, [SMX60_IEU]>;
+def : WriteRes<WriteCPOP32, [SMX60_IEU]>;
+
+def : WriteRes<WriteORCB, [SMX60_IEU]>;
+
+def : WriteRes<WriteIMinMax, [SMX60_IEU]>;
+
+def : WriteRes<WriteREV8, [SMX60_IEU]>;
+
+def : WriteRes<WriteSHXADD, [SMX60_IEU]>;
+def : WriteRes<WriteSHXADD32, [SMX60_IEU]>;
+
+// Single-bit instructions
+def : WriteRes<WriteSingleBit, [SMX60_IEU]>;
+def : WriteRes<WriteSingleBitImm, [SMX60_IEU]>;
+def : WriteRes<WriteBEXT, [SMX60_IEU]>;
+def : WriteRes<WriteBEXTI, [SMX60_IEU]>;
+
+// Memory/Atomic memory
+let Latency = 3 in {
+  def : WriteRes<WriteSTB, [SMX60_LS]>;
+  def : WriteRes<WriteSTH, [SMX60_LS]>;
+  def : WriteRes<WriteSTW, [SMX60_LS]>;
+  def : WriteRes<WriteSTD, [SMX60_LS]>;
+  def : WriteRes<WriteFST16, [SMX60_LS]>;
+  def : WriteRes<WriteFST32, [SMX60_LS]>;
+  def : WriteRes<WriteFST64, [SMX60_LS]>;
+  def : WriteRes<WriteAtomicSTW, [SMX60_LS]>;
+  def : WriteRes<WriteAtomicSTD, [SMX60_LS]>;
+}
+
+let Latency = 5 in {
+  def : WriteRes<WriteLDB, [SMX60_LS]>;
+  def : WriteRes<WriteLDH, [SMX60_LS]>;
+  def : WriteRes<WriteLDW, [SMX60_LS]>;
+  def : WriteRes<WriteLDD, [SMX60_LS]>;
+  def : WriteRes<WriteFLD16, [SMX60_LS]>;
+  def : WriteRes<WriteFLD32, [SMX60_LS]>;
+  def : WriteRes<WriteFLD64, [SMX60_LS]>;
+}
+
+// Atomics
+let Latency = 5 in {
+  def : WriteRes<WriteAtomicLDW, [SMX60_LS]>;
+  def : WriteRes<WriteAtomicLDD, [SMX60_LS]>;
+  def : WriteRes<WriteAtomicW, [SMX60_LS]>;
+  def : WriteRes<WriteAtomicD, [SMX60_LS]>;
+}
+
+// Floating point units Half precision
+def : WriteRes<WriteFAdd16, [SMX60_FP]> { let Latency = 3; }
+def : WriteRes<WriteFMul16, [SMX60_FP]> { let Latency = 3; }
+def : WriteRes<WriteFMA16, [SMX60_FP]> { let Latency = 4; }
+def : WriteRes<WriteFSGNJ16, [SMX60_FP]> { let Latency = 3; }
+def : WriteRes<WriteFMinMax16, [SMX60_FP]> { let Latency = 3; }
+
+// Worst case latency is used
+let Latency = 7, ReleaseAtCycles = [7] in {
+  def :  WriteRes<WriteFDiv16, [SMX60_FP]>;
+  def :  WriteRes<WriteFSqrt16, [SMX60_FP]>;
+}
+
+// Single precision
+def : WriteRes<WriteFAdd32, [SMX60_FP]> { let Latency = 3; }
+def : WriteRes<WriteFMul32, [SMX60_FP]> { let Latency = 4; }
+def : WriteRes<WriteFMA32, [SMX60_FP]> { let Latency = 5; }
+def : WriteRes<WriteFSGNJ32, [SMX60_FP]> { let Latency = 3; }
+def : WriteRes<WriteFMinMax32, [SMX60_FP]> { let Latency = 3; }
+
+// Worst case latency is used
+let Latency = 10, ReleaseAtCycles = [10] in {
+  def :  WriteRes<WriteFDiv32, [SMX60_FP]>;
+  def :  WriteRes<WriteFSqrt32, [SMX60_FP]>;
+}
+
+// Double precision
+def : WriteRes<WriteFAdd64, [SMX60_FP]> { let Latency = 4; }
+def : WriteRes<WriteFMul64, [SMX60_FP]> { let Latency = 4; }
+def : WriteRes<WriteFMA64, [SMX60_FP]> { let Latency = 5; }
+def : WriteRes<WriteFSGNJ64, [SMX60_FP]> { let Latency = 3; }
+def : WriteRes<WriteFMinMax64, [SMX60_FP]> { let Latency = 3; }
+
+let Latency = 10, ReleaseAtCycles = [10] in {
+  def :  WriteRes<WriteFDiv64, [SMX60_FP]>;
+  def :  WriteRes<WriteFSqrt64, [SMX60_FP]>;
+}
+
+// Conversions
+let Latency = 3 in {
+  def : WriteRes<WriteFCvtI32ToF16, [SMX60_IEU]>;
+  def : WriteRes<WriteFCvtI32ToF32, [SMX60_IEU]>;
+  def : WriteRes<WriteFCvtI32ToF64, [SMX60_IEU]>;
+  def : WriteRes<WriteFCvtI64ToF16, [SMX60_IEU]>;
+  def : WriteRes<WriteFCvtI64ToF32, [SMX60_IEU]>;
+  def : WriteRes<WriteFCvtI64ToF64, [SMX60_IEU]>;
+  def : WriteRes<WriteFCvtF16ToI32, [SMX60_IEU]>;
+  def : WriteRes<WriteFCvtF16ToI64, [SMX60_IEU]>;
+  def : WriteRes<WriteFCvtF16ToF32, [SMX60_FP]>;
+  def : WriteRes<WriteFCvtF16ToF64, [SMX60_FP]>;
+  def : WriteRes<WriteFCvtF32ToI32, [SMX60_IEU]>;
+  def : WriteRes<WriteFCvtF32ToI64, [SMX60_IEU]>;
+  def : WriteRes<WriteFCvtF32ToF16, [SMX60_FP]>;
+  def : WriteRes<WriteFCvtF32ToF64, [SMX60_FP]>;
+  def : WriteRes<WriteFCvtF64ToI32, [SMX60_IEU]>;
+  def : WriteRes<WriteFCvtF64ToI64, [SMX60_IEU]>;
+  def : WriteRes<WriteFCvtF64ToF16, [SMX60_FP]>;
+  def : WriteRes<WriteFCvtF64ToF32, [SMX60_FP]>;
+}
+
+let Latency = 2 in {
+  def : WriteRes<WriteFClass16, [SMX60_FP]>;
+  def : WriteRes<WriteFClass32, [SMX60_FP]>;
+  def : WriteRes<WriteFClass64, [SMX60_FP]>;
+}
+
+let Latency = 4 in {
+  def : WriteRes<WriteFCmp16, [SMX60_FP]>;
+  def : WriteRes<WriteFCmp32, [SMX60_FP]>;
+  def : WriteRes<WriteFCmp64, [SMX60_FP]>;
+}
+
+let Latency = 2 in {
+  def : WriteRes<WriteFMovI16ToF16, [SMX60_IEU]>;
+  def : WriteRes<WriteFMovF16ToI16, [SMX60_IEU]>;
+  def : WriteRes<WriteFMovI32ToF32, [SMX60_IEU]>;
+  def : WriteRes<WriteFMovF32ToI32, [SMX60_IEU]>;
+  def : WriteRes<WriteFMovI64ToF64, [SMX60_IEU]>;
+  def : WriteRes<WriteFMovF64ToI64, [SMX60_IEU]>;
+}
+
+// Others
+def : WriteRes<WriteCSR, [SMX60_IEU]>;
+def : WriteRes<WriteNop, [SMX60_IEU]>;
+
+//===----------------------------------------------------------------------===//
+// Bypass and advance
+def : ReadAdvance<ReadJmp, 0>;
+def : ReadAdvance<ReadJalr, 0>;
+def : ReadAdvance<ReadCSR, 0>;
+def : ReadAdvance<ReadStoreData, 0>;
+def : ReadAdvance<ReadMemBase, 0>;
+def : ReadAdvance<ReadIALU, 0>;
+def : ReadAdvance<ReadIALU32, 0>;
+def : ReadAdvance<ReadShiftImm, 0>;
+def : ReadAdvance<ReadShiftImm32, 0>;
+def : ReadAdvance<ReadShiftReg, 0>;
+def : ReadAdvance<ReadShiftReg32, 0>;
+def : ReadAdvance<ReadIDiv, 0>;
+def : ReadAdvance<ReadIDiv32, 0>;
+def : ReadAdvance<ReadIRem, 0>;
+def : ReadAdvance<ReadIRem32, 0>;
+def : ReadAdvance<ReadIMul, 0>;
+def : ReadAdvance<ReadIMul32, 0>;
+def : ReadAdvance<ReadAtomicWA, 0>;
+def : ReadAdvance<ReadAtomicWD, 0>;
+def : ReadAdvance<ReadAtomicDA, 0>;
+def : ReadAdvance<ReadAtomicDD, 0>;
+def : ReadAdvance<ReadAtomicLDW, 0>;
+def : ReadAdvance<ReadAtomicLDD, 0>;
+def : ReadAdvance<ReadAtomicSTW, 0>;
+def : ReadAdvance<ReadAtomicSTD, 0>;
+def : ReadAdvance<ReadFStoreData, 0>;
+def : ReadAdvance<ReadFMemBase, 0>;
+def : ReadAdvance<ReadFAdd16, 0>;
+def : ReadAdvance<ReadFAdd32, 0>;
+def : ReadAdvance<ReadFAdd64, 0>;
+def : ReadAdvance<ReadFMul16, 0>;
+def : ReadAdvance<ReadFMA16, 0>;
+def : ReadAdvance<ReadFMA16Addend, 0>;
+def : ReadAdvance<ReadFMul32, 0>;
+def : ReadAdvance<ReadFMul64, 0>;
+def : ReadAdvance<ReadFMA32, 0>;
+def : ReadAdvance<ReadFMA32Addend, 0>;
+def : ReadAdvance<ReadFMA64, 0>;
+def : ReadAdvance<ReadFMA64Addend, 0>;
+def : ReadAdvance<ReadFDiv16, 0>;
+def : ReadAdvance<ReadFDiv32, 0>;
+def : ReadAdvance<ReadFDiv64, 0>;
+def : ReadAdvance<ReadFSqrt16, 0>;
+def : ReadAdvance<ReadFSqrt32, 0>;
+def : ReadAdvance<ReadFSqrt64, 0>;
+def : ReadAdvance<ReadFCmp16, 0>;
+def : ReadAdvance<ReadFCmp32, 0>;
+def : ReadAdvance<ReadFCmp64, 0>;
+def : ReadAdvance<ReadFSGNJ16, 0>;
+def : ReadAdvance<ReadFSGNJ32, 0>;
+def : ReadAdvance<ReadFSGNJ64, 0>;
+def : ReadAdvance<ReadFMinMax16, 0>;
+def : ReadAdvance<ReadFMinMax32, 0>;
+def : ReadAdvance<ReadFMinMax64, 0>;
+def : ReadAdvance<ReadFCvtF16ToI32, 0>;
+def : ReadAdvance<ReadFCvtF16ToI64, 0>;
+def : ReadAdvance<ReadFCvtF32ToI32, 0>;
+def : ReadAdvance<ReadFCvtF32ToI64, 0>;
+def : ReadAdvance<ReadFCvtF64ToI32, 0>;
+def : ReadAdvance<ReadFCvtF64ToI64, 0>;
+def : ReadAdvance<ReadFCvtI32ToF16, 0>;
+def : ReadAdvance<ReadFCvtI32ToF32, 0>;
+def : ReadAdvance<ReadFCvtI32ToF64, 0>;
+def : ReadAdvance<ReadFCvtI64ToF16, 0>;
+def : ReadAdvance<ReadFCvtI64ToF32, 0>;
+def : ReadAdvance<ReadFCvtI64ToF64, 0>;
+def : ReadAdvance<ReadFCvtF32ToF64, 0>;
+def : ReadAdvance<ReadFCvtF64ToF32, 0>;
+def : ReadAdvance<ReadFCvtF16ToF32, 0>;
+def : ReadAdvance<ReadFCvtF32ToF16, 0>;
+def : ReadAdvance<ReadFCvtF16ToF64, 0>;
+def : ReadAdvance<ReadFCvtF64ToF16, 0>;
+def : ReadAdvance<ReadFMovF16ToI16, 0>;
+def : ReadAdvance<ReadFMovI16ToF16, 0>;
+def : ReadAdvance<ReadFMovF32ToI32, 0>;
+def : ReadAdvance<ReadFMovI32ToF32, 0>;
+def : ReadAdvance<ReadFMovF64ToI64, 0>;
+def : ReadAdvance<ReadFMovI64ToF64, 0>;
+def : ReadAdvance<ReadFClass16, 0>;
+def : ReadAdvance<ReadFClass32, 0>;
+def : ReadAdvance<ReadFClass64, 0>;
+
+// Bitmanip
+def : ReadAdvance<ReadRotateImm, 0>;
+def : ReadAdvance<ReadRotateImm32, 0>;
+def : ReadAdvance<ReadRotateReg, 0>;
+def : ReadAdvance<ReadRotateReg32, 0>;
+def : ReadAdvance<ReadCLZ, 0>;
+def : ReadAdvance<ReadCLZ32, 0>;
+def : ReadAdvance<ReadCTZ, 0>;
+def : ReadAdvance<ReadCTZ32, 0>;
+def : ReadAdvance<ReadCPOP, 0>;
+def : ReadAdvance<ReadCPOP32, 0>;
+def : ReadAdvance<ReadORCB, 0>;
+def : ReadAdvance<ReadIMinMax, 0>;
+def : ReadAdvance<ReadREV8, 0>;
+def : ReadAdvance<ReadSHXADD, 0>;
+def : ReadAdvance<ReadSHXADD32, 0>;
+// Single-bit instructions
+def : ReadAdvance<ReadSingleBit, 0>;
+def : ReadAdvance<ReadSingleBitImm, 0>;
+
+//===----------------------------------------------------------------------===//
+// Unsupported extensions
+defm : UnsupportedSchedV;
+defm : UnsupportedSchedXsfvcp;
+defm : UnsupportedSchedZabha;
+defm : UnsupportedSchedZbc;
+defm : UnsupportedSchedZbkb;
+defm : UnsupportedSchedZbkx;
+defm : UnsupportedSchedZfa;
+defm : UnsupportedSchedZvk;
+defm : UnsupportedSchedSFB;
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll b/llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll
index 75f4b977a98b0..b384a0187a1ce 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vxrm-insert-out-of-loop.ll
@@ -302,32 +302,32 @@ define void @test1(ptr nocapture noundef writeonly %dst, i32 noundef signext %i_
 ; RV64X60-NEXT:    .cfi_offset s4, -40
 ; RV64X60-NEXT:    li t0, 0
 ; RV64X60-NEXT:    li t1, 0
-; RV64X60-NEXT:    addi t2, a7, -1
-; RV64X60-NEXT:    add t4, a0, a6
-; RV64X60-NEXT:    add t5, a2, a6
-; RV64X60-NEXT:    add t3, a4, a6
-; RV64X60-NEXT:    zext.w s0, t2
-; RV64X60-NEXT:    mul s1, a1, s0
-; RV64X60-NEXT:    add t4, t4, s1
-; RV64X60-NEXT:    mul s1, a3, s0
-; RV64X60-NEXT:    add t5, t5, s1
+; RV64X60-NEXT:    addi s1, a7, -1
+; RV64X60-NEXT:    zext.w s1, s1
+; RV64X60-NEXT:    mul t2, a1, s1
+; RV64X60-NEXT:    mul t3, a3, s1
+; RV64X60-NEXT:    mul t4, a5, s1
+; RV64X60-NEXT:    add s1, a0, a6
+; RV64X60-NEXT:    add s0, a2, a6
+; RV64X60-NEXT:    add t5, a4, a6
+; RV64X60-NEXT:    add s2, s1, t2
 ; RV64X60-NEXT:    csrr t2, vlenb
-; RV64X60-NEXT:    mul s1, a5, s0
-; RV64X60-NEXT:    add t3, t3, s1
-; RV64X60-NEXT:    sltu s1, a0, t5
-; RV64X60-NEXT:    sltu s0, a2, t4
-; RV64X60-NEXT:    and t6, s1, s0
+; RV64X60-NEXT:    add t3, t3, s0
+; RV64X60-NEXT:    or t6, a1, a3
+; RV64X60-NEXT:    add t4, t4, t5
+; RV64X60-NEXT:    sltu s0, a0, t3
+; RV64X60-NEXT:    sltu s1, a2, s2
+; RV64X60-NEXT:    and t5, s0, s1
+; RV64X60-NEXT:    slli t3, t2, 1
+; RV64X60-NEXT:    slti s1, t6, 0
+; RV64X60-NEXT:    sltu s0, a0, t4
+; RV64X60-NEXT:    or t4, t5, s1
+; RV64X60-NEXT:    sltu s1, a4, s2
+; RV64X60-NEXT:    and s0, s0, s1
+; RV64X60-NEXT:    or s1, a1, a5
 ; RV64X60-NEXT:    li t5, 32
-; RV64X60-NEXT:    sltu s1, a0, t3
-; RV64X60-NEXT:    sltu s0, a4, t4
-; RV64X60-NEXT:    and t3, s1, s0
-; RV64X60-NEXT:    or s1, a1, a3
 ; RV64X60-NEXT:    slti s1, s1, 0
-; RV64X60-NEXT:    or t4, t6, s1
-; RV64X60-NEXT:    or s0, a1, a5
-; RV64X60-NEXT:    slti s0, s0, 0
-; RV64X60-NEXT:    or s0, t3, s0
-; RV64X60-NEXT:    slli t3, t2, 1
+; RV64X60-NEXT:    or s0, s0, s1
 ; RV64X60-NEXT:    maxu s1, t3, t5
 ; RV64X60-NEXT:    or s0, t4, s0
 ; RV64X60-NEXT:    sltu s1, a6, s1
@@ -339,8 +339,8 @@ define void @test1(ptr nocapture noundef writeonly %dst, i32 noundef signext %i_
 ; RV64X60-NEXT:    # in Loop: Header=BB0_4 Depth=1
 ; RV64X60-NEXT:    add t5, t5, a1
 ; RV64X60-NEXT:    add a2, a2, a3
-; RV64X60-NEXT:    add a4, a4, a5
 ; RV64X60-NEXT:    addiw t1, t1, 1
+; RV64X60-NEXT:    add a4, a4, a5
 ; RV64X60-NEXT:    addi t0, t0, 1
 ; RV64X60-NEXT:    beq t1, a7, .LBB0_11
 ; RV64X60-NEXT:  .LBB0_4: # %for.cond1.preheader.us
@@ -367,10 +367,10 @@ define void @test1(ptr nocapture noundef writeonly %dst, i32 noundef signext %i_
 ; RV64X60-NEXT:    vl2r.v v8, (s2)
 ; RV64X60-NEXT:    vl2r.v v10, (s3)
 ; RV64X60-NEXT:    sub s1, s1, t3
-; RV64X60-NEXT:    add s3, s3, t3
 ; RV64X60-NEXT:    vaaddu.vv v8, v8, v10
 ; RV64X60-NEXT:    vs2r.v v8, (s4)
 ; RV64X60-NEXT:    add s4, s4, t3
+; RV64X60-NEXT:    add s3, s3, t3
 ; RV64X60-NEXT:    add s2, s2, t3
 ; RV64X60-NEXT:    bnez s1, .LBB0_7
 ; RV64X60-NEXT:  # %bb.8: # %middle.block
diff --git a/llvm/test/tools/llvm-mca/RISCV/SpacemitX60/atomic.s b/llvm/test/tools/llvm-mca/RISCV/SpacemitX60/atomic.s
new file mode 100644
index 0000000000000..73109a78cd4b9
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/RISCV/SpacemitX60/atomic.s
@@ -0,0 +1,312 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=riscv64 -mattr=+rva22u64 -mcpu=spacemit-x60 -iterations=1 < %s | FileCheck %s
+
+# Zalrsc
+lr.w t0, (t1)
+lr.w.aq t1, (t2)
+lr.w.rl t2, (t3)
+lr.w.aqrl t3, (t4)
+sc.w t6, t5, (t4)
+sc.w.aq t5, t4, (t3)
+sc.w.rl t4, t3, (t2)
+sc.w.aqrl t3, t2, (t1)
+
+lr.d t0, (t1)
+lr.d.aq t1, (t2)
+lr.d.rl t2, (t3)
+lr.d.aqrl t3, (t4)
+sc.d t6, t5, (t4)
+sc.d.aq t5, t4, (t3)
+sc.d.rl t4, t3, (t2)
+sc.d.aqrl t3, t2, (t1)
+
+# Zaamo
+amoswap.w a4, ra, (s0)
+amoadd.w a1, a2, (a3)
+amoxor.w a2, a3, (a4)
+amoand.w a3, a4, (a5)
+amoor.w a4, a5, (a6)
+amomin.w a5, a6, (a7)
+amomax.w s7, s6, (s5)
+amominu.w s6, s5, (s4)
+amomaxu.w s5, s4, (s3)
+
+amoswap.w.aq a4, ra, (s0)
+amoadd.w.aq a1, a2, (a3)
+amoxor.w.aq a2, a3, (a4)
+amoand.w.aq a3, a4, (a5)
+amoor.w.aq a4, a5, (a6)
+amomin.w.aq a5, a6, (a7)
+amomax.w.aq s7, s6, (s5)
+amominu.w.aq s6, s5, (s4)
+amomaxu.w.aq s5, s4, (s3)
+
+amoswap.w.rl a4, ra, (s0)
+amoadd.w.rl a1, a2, (a3)
+amoxor.w.rl a2, a3, (a4)
+amoand.w.rl a3, a4, (a5)
+amoor.w.rl a4, a5, (a6)
+amomin.w.rl a5, a6, (a7)
+amomax.w.rl s7, s6, (s5)
+amominu.w.rl s6, s5, (s4)
+amomaxu.w.rl s5, s4, (s3)
+
+amoswap.w.aqrl a4, ra, (s0)
+amoadd.w.aqrl a1, a2, (a3)
+amoxor.w.aqrl a2, a3, (a4)
+amoand.w.aqrl a3, a4, (a5)
+amoor.w.aqrl a4, a5, (a6)
+amomin.w.aqrl a5, a6, (a7)
+amomax.w.aqrl s7, s6, (s5)
+amominu.w.aqrl s6, s5, (s4)
+amomaxu.w.aqrl s5, s4, (s3)
+
+amoswap.d a4, ra, (s0)
+amoadd.d a1, a2, (a3)
+amoxor.d a2, a3, (a4)
+amoand.d a3, a4, (a5)
+amoor.d a4, a5, (a6)
+amomin.d a5, a6, (a7)
+amomax.d s7, s6, (s5)
+amominu.d s6, s5, (s4)
+amomaxu.d s5, s4, (s3)
+
+amoswap.d.aq a4, ra, (s0)
+amoadd.d.aq a1, a2, (a3)
+amoxor.d.aq a2, a3, (a4)
+amoand.d.aq a3, a4, (a5)
+amoor.d.aq a4, a5, (a6)
+amomin.d.aq a5, a6, (a7)
+amomax.d.aq s7, s6, (s5)
+amominu.d.aq s6, s5, (s4)
+amomaxu.d.aq s5, s4, (s3)
+
+amoswap.d.rl a4, ra, (s0)
+amoadd.d.rl a1, a2, (a3)
+amoxor.d.rl a2, a3, (a4)
+amoand.d.rl a3, a4, (a5)
+amoor.d.rl a4, a5, (a6)
+amomin.d.rl a5, a6, (a7)
+amomax.d.rl s7, s6, (s5)
+amominu.d.rl s6, s5, (s4)
+amomaxu.d.rl s5, s4, (s3)
+
+amoswap.d.aqrl a4, ra, (s0)
+amoadd.d.aqrl a1, a2, (a3)
+amoxor.d.aqrl a2, a3, (a4)
+amoand.d.aqrl a3, a4, (a5)
+amoor.d.aqrl a4, a5, (a6)
+amomin.d.aqrl a5, a6, (a7)
+amomax.d.aqrl s7, s6, (s5)
+amominu.d.aqrl s6, s5, (s4)
+amomaxu.d.aqrl s5, s4, (s3)
+
+# CHECK:      Iterations:        1
+# CHECK-NEXT: Instructions:      88
+# CHECK-NEXT: Total Cycles:      86
+# CHECK-NEXT: Total uOps:        88
+
+# CHECK:      Dispatch Width:    2
+# CHECK-NEXT: uOps Per Cycle:    1.02
+# CHECK-NEXT: IPC:               1.02
+# CHECK-NEXT: Block RThroughput: 44.0
+
+# CHECK:      Instruction Info:
+# CHECK-NEXT: [1]: #uOps
+# CHECK-NEXT: [2]: Latency
+# CHECK-NEXT: [3]: RThroughput
+# CHECK-NEXT: [4]: MayLoad
+# CHECK-NEXT: [5]: MayStore
+# CHECK-NEXT: [6]: HasSideEffects (U)
+
+# CHECK:      [1]    [2]    [3]    [4]    [5]    [6]    Instructions:
+# CHECK-NEXT:  1      5     0.50    *                   lr.w	t0, (t1)
+# CHECK-NEXT:  1      5     0.50    *                   lr.w.aq	t1, (t2)
+# CHECK-NEXT:  1      5     0.50    *                   lr.w.rl	t2, (t3)
+# CHECK-NEXT:  1      5     0.50    *                   lr.w.aqrl	t3, (t4)
+# CHECK-NEXT:  1      3     0.50           *            sc.w	t6, t5, (t4)
+# CHECK-NEXT:  1      3     0.50           *            sc.w.aq	t5, t4, (t3)
+# CHECK-NEXT:  1      3     0.50           *            sc.w.rl	t4, t3, (t2)
+# CHECK-NEXT:  1      3     0.50           *            s...
[truncated]

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td

asb · 2025-04-25T16:16:55Z

I've discussed earlier versions of this patch with Mikhail a fair bit so I'd appreciate a review from outside our org, but I will just share my thoughts on criteria for whether this makes sense to merge at this point:

For me the key thing is that per Mikhail's benchmarking it's at the point where there's pretty much across the board improvements and importantly, there's no evidence of regressions due to introducing scalar but not having vector scheduling. As for the fine details of the model, we're not going to get the very high fidelity of the 7-series model without much more microarchitectural information or a lot more reverse engineering. What is here seems a reasonable starting point.

I'd appreciate comments on anything that seems anomalous vs other scheduling models - we should aim to basically match the pattern of others unless there's data or documented information to differ. With that in mind, the store latency of 3 is a slight oddity vs other similar models. But the A55 had latency=4 up to f73334c and the commit comment indicates it has very limited impact on scheduling anyway. So unless people feel strongly it's worth more investigation right now, I propose sticking with what Mikhail suggests.

(Incidentally, should any of our models be setting RetireOOO like the A55 does?)

preames

This looks general reasonable, and I agree that for a black box schedule model the right threshold is to use observed performance.

I'm going to run through (in a second pass) the available data for this core, and cross check the model. Forthcoming shortly.

preames · 2025-04-25T16:56:19Z

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td

+  // An IEU can decode and issue two instructions at the same time
+  def SMX60_IEU : ProcResource<2>;
+
+  def SMX60_FP : ProcResource<1>;


Add a comment here including the bit from your review description about why dual issue isn't used here.

Some floating-point instructions can double issue, such as those using FALU and FMAU, but not, for example, FCVT.
Mikhail mentioned that a value of 1 would give better performance, so we can start with 1. We will continue to improve this model in the future

Let's start with 1 here, and then see if we can split this in a follow up patch.

As a follow up (as in, not in this patch), it would be good to explore this further.

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td

wangpc-pp · 2025-04-25T19:18:20Z

I just wonder why we didn't ask Spacemit guys to provide the schedule model. They have a compiler team but are not so active in upstream.

cc @zqb-all @zengdage @sunshaoce

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td

llvm/test/tools/llvm-mca/RISCV/SpacemitX60/atomic.s

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td

Signed-off-by: Mikhail R. Gadelha <[email protected]>

zqb-all · 2025-04-26T14:56:38Z

I just wonder why we didn't ask Spacemit guys to provide the schedule model. They have a compiler team but are not so active in upstream.

cc @zqb-all @zengdage @sunshaoce

Thanks Mikhail for bringing the initial schedule model support to x60. We will take a look at this patch and work with the upstream to improve the performance of x60.

Signed-off-by: Mikhail R. Gadelha <[email protected]>

llvm/lib/Target/RISCV/RISCV.td

Signed-off-by: Mikhail R. Gadelha <[email protected]>

zqb-all · 2025-04-28T09:51:07Z

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td

+def SpacemitX60Model : SchedMachineModel {
+  let IssueWidth        = 2; // dual-issue
+  let MicroOpBufferSize = 0; // in-order
+  let LoadLatency       = 5; // worse case: >= 3


Load latency is 3 or 4 in the case of cachehit, but since load=5 actually performs the best in tests, we can keep this until another configuration beats it in test performance

As a follow up (as in, not in this patch), please run another sweep of this parameter with the final model, and post a follow on if it needs to be tweaked slightly.

zqb-all · 2025-04-28T10:00:17Z

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td

+  let IssueWidth        = 2; // dual-issue
+  let MicroOpBufferSize = 0; // in-order
+  let LoadLatency       = 5; // worse case: >= 3
+  let MispredictPenalty = 9; // nine-stage


In the case of L1 cache hit, the penalty is about 3-6 cycle
However, we didn't test the performance impact of tuning this parameter. If a different value is better for the test results, then just use it :)

As a follow up (as in, not in this patch), please run another sweep of this parameter with the final model, and post a follow on if it needs to be tweaked slightly.

Signed-off-by: Mikhail R. Gadelha <[email protected]>

mikhailramalho · 2025-04-28T14:52:59Z

Folks, we wrote a probe to double-check all latencies in the scheduler and updated the latencies accordingly. We tested it on two different boards to confirm the numbers.

The only outlier was idiv/irem, which was reported to have a latency of 3-4 in our experiments, so we went with the worst-case value shown in the C908 manual. I added a TODO to revisit this later.

I also included the latencies for clmul, clmulr, and clmulh, which were missing from the first version of this PR.

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td

Signed-off-by: Mikhail R. Gadelha <[email protected]>

mshockwave · 2025-04-28T17:09:23Z

so we went with the worst-case value shown in the C908 manual

I think this is reasonable for integer divisions.

preames

Looks like this is converging with the feedback from @zqb-all (Thanks!). Minor comment only.

preames · 2025-04-28T19:37:10Z

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td

+  // An IEU can decode and issue two instructions at the same time
+  def SMX60_IEU : ProcResource<2>;
+
+  def SMX60_FP : ProcResource<1>;


Let's start with 1 here, and then see if we can split this in a follow up patch.

preames

LGTM

preames · 2025-05-05T16:03:35Z

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td

+def SpacemitX60Model : SchedMachineModel {
+  let IssueWidth        = 2; // dual-issue
+  let MicroOpBufferSize = 0; // in-order
+  let LoadLatency       = 5; // worse case: >= 3


As a follow up (as in, not in this patch), please run another sweep of this parameter with the final model, and post a follow on if it needs to be tweaked slightly.

preames · 2025-05-05T16:03:47Z

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td

+  let IssueWidth        = 2; // dual-issue
+  let MicroOpBufferSize = 0; // in-order
+  let LoadLatency       = 5; // worse case: >= 3
+  let MispredictPenalty = 9; // nine-stage


As a follow up (as in, not in this patch), please run another sweep of this parameter with the final model, and post a follow on if it needs to be tweaked slightly.

preames · 2025-05-05T16:04:33Z

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td

+  // An IEU can decode and issue two instructions at the same time
+  def SMX60_IEU : ProcResource<2>;
+
+  def SMX60_FP : ProcResource<1>;


As a follow up (as in, not in this patch), it would be good to explore this further.

preames · 2025-05-05T16:06:47Z

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td

+def : WriteRes<WriteIMinMax, [SMX60_IEU]>;
+def : WriteRes<WriteREV8, [SMX60_IEU]>;
+
+let Latency = 2 in {


As a follow up (as in, not in this patch), it would be interesting to explore if this is actually two cycle latency, or if this is micro-coded as two uops, each with latency one. You could maybe see this in perf counters.

zqb-all · 2025-05-07T06:57:19Z

Folks, we wrote a probe to double-check all latencies in the scheduler and updated the latencies accordingly. We tested it on two different boards to confirm the numbers.

Hi @mikhailramalho , is "probe" referring to "llvm-exegesis", or does it refer to another independent program?

preames · 2025-05-07T14:41:25Z

Folks, we wrote a probe to double-check all latencies in the scheduler and updated the latencies accordingly. We tested it on two different boards to confirm the numbers.

Hi @mikhailramalho , is "probe" referring to "llvm-exegesis", or does it refer to another independent program?

From context in offline discussion, this was an ad-hoc mix of llvm-exegesis where it seemed to produce sane results, and custom assembly snippets.

There's definitely room for error here; this type of micro-architectural exploration is hard and error prone. I approved this mostly based on the net perf results, not any expectation that every number for every instruction was correct.

I'd encourage you to make suggestions for improvements. Ideally in the form of pull requests, but if you want to drop comments here, Mikhail or I can follow up.

zqb-all · 2025-05-07T15:04:21Z

From context in offline discussion, this was an ad-hoc mix of llvm-exegesis where it seemed to produce sane results, and custom assembly snippets.

There's definitely room for error here; this type of micro-architectural exploration is hard and error prone. I approved this mostly based on the net perf results, not any expectation that every number for every instruction was correct.

I'd encourage you to make suggestions for improvements. Ideally in the form of pull requests, but if you want to drop comments here, Mikhail or I can follow up.

Thanks, I don't mean to question a certain number in this configuration., this patch is good.

I just want to learn this way of probing, so that it will also be helpful when improving the configuration in the future.

mikhailramalho · 2025-05-07T15:30:01Z

Folks, we wrote a probe to double-check all latencies in the scheduler and updated the latencies accordingly. We tested it on two different boards to confirm the numbers.

Hi @mikhailramalho , is "probe" referring to "llvm-exegesis", or does it refer to another independent program?

Hi @zqb-all, it's a custom probing tool that we plan to share soon

This patch adds an initial scheduler model for the SpacemiT-X60, including latency for scalar instructions only. The scheduler is based on the documented characteristics of the C908, which the SpacemiT-X60 is believed to be based on, and provides the expected latency for several instructions. I ran a probe to confirm all of these values and to get the latency of instructions not provided by the C908 documentation (e.g., double floating-point instructions). For load and store instructions, the C908 documentation says the latency is \>= 3 for load and 1 for store. I tried a few combinations of values until I got the current values of 5 and 3, which yield the best results. Although the X60 does appear to support multiple issue for at least some floating point instructions, this model assumes single issue as increasing it reduces the gains below. This patch gives a geomean improvement of ~4% on SPEC CPU 2017 for both rva22u64 and rva22u64_v, with some benchmarks improving up to 18% (508.namd_r). There were a couple of execution time regressions, but only in noisy benchmarks (523.xalancbmk_r and 510.parest_r). * rva22u64: https://lnt.lukelau.me/db_default/v4/nts/507?compare_to=405 (compares a55f727 to the baseline 8286b80) * rva22u64_v: https://lnt.lukelau.me/db_default/v4/nts/474?compare_to=404 (compares a55f727 to the baseline 8286b80) This initial scheduling model is strongly focused on providing sufficient definitions to provide improved performance for the SpacemiT-X60. Further incremental gains may be possible through a much more detailed microarchitectural analysis, but that is left to future work. Further scheduling definitions for RVV can be added in a future PR.

[RISCV] Add scheduler for x60

66afbfd

mikhailramalho requested review from preames, mshockwave, dtcxzyw and topperc April 25, 2025 14:53

llvmbot added the backend:RISC-V label Apr 25, 2025

mikhailramalho requested review from asb and lukel97 April 25, 2025 14:53

topperc reviewed Apr 25, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td Show resolved Hide resolved

preames reviewed Apr 25, 2025

View reviewed changes

topperc reviewed Apr 25, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td Outdated Show resolved Hide resolved

mshockwave reviewed Apr 25, 2025

View reviewed changes

mikhailramalho added 8 commits April 25, 2025 18:16

Merge remote-tracking branch 'origin/main' into x60-sched

9aaca4f

Add comment about fp proc

b48c476

Signed-off-by: Mikhail R. Gadelha <[email protected]>

Split IEU procResource and set jump to only use one

cb4bb76

Signed-off-by: Mikhail R. Gadelha <[email protected]>

single-issue for div and add ReleaseAtCycles

fdfb1c0

Signed-off-by: Mikhail R. Gadelha <[email protected]>

Grouped together latencies with the same value

05b1468

Signed-off-by: Mikhail R. Gadelha <[email protected]>

Removed -mattr

0ffd576

Signed-off-by: Mikhail R. Gadelha <[email protected]>

Added -instruction-tables=full to the tests

016e974

Signed-off-by: Mikhail R. Gadelha <[email protected]>

Update latencies based on experiments

f0f9f21

Signed-off-by: Mikhail R. Gadelha <[email protected]>

Add latency for cpop

f0c1830

Signed-off-by: Mikhail R. Gadelha <[email protected]>

wangpc-pp requested a review from zqb-all April 27, 2025 04:24

wangpc-pp reviewed Apr 27, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCV.td Outdated Show resolved Hide resolved

mikhailramalho added 3 commits April 27, 2025 15:45

Updated latency based on experiments

31ef91b

Signed-off-by: Mikhail R. Gadelha <[email protected]>

Updated tests

753bce9

Signed-off-by: Mikhail R. Gadelha <[email protected]>

Reorder includes

dbe2646

Signed-off-by: Mikhail R. Gadelha <[email protected]>

mikhailramalho added 3 commits April 27, 2025 16:04

Fix div/rem latencies

b73c7e0

Signed-off-by: Mikhail R. Gadelha <[email protected]>

Updated test case

a309a29

Signed-off-by: Mikhail R. Gadelha <[email protected]>

More test updates

dd5d7e0

Signed-off-by: Mikhail R. Gadelha <[email protected]>

zqb-all reviewed Apr 28, 2025

View reviewed changes

mikhailramalho added 4 commits April 28, 2025 10:15

Swap the order so comment makes more sense

7f610b2

Signed-off-by: Mikhail R. Gadelha <[email protected]>

Revert back div/rem latencies

7d0a715

Signed-off-by: Mikhail R. Gadelha <[email protected]>

Added tests for clmul, clmulr, clmulh

840f374

Signed-off-by: Mikhail R. Gadelha <[email protected]>

Add comment about div/rem

8030c88

Signed-off-by: Mikhail R. Gadelha <[email protected]>

asb reviewed Apr 28, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVSchedSpacemitX60.td Outdated Show resolved Hide resolved

Fix comment

a55f727

Signed-off-by: Mikhail R. Gadelha <[email protected]>

preames reviewed Apr 28, 2025

View reviewed changes

preames approved these changes May 5, 2025

View reviewed changes

mikhailramalho merged commit 4eac576 into llvm:main May 6, 2025
11 checks passed

mikhailramalho deleted the x60-sched branch May 6, 2025 16:31

[RISCV] Add scheduler definitions for SpacemiT-X60 #137343

[RISCV] Add scheduler definitions for SpacemiT-X60 #137343

Uh oh!

Conversation

mikhailramalho commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Apr 25, 2025

Uh oh!

Uh oh!

asb commented Apr 25, 2025

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wangpc-pp commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zqb-all commented Apr 26, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikhailramalho commented Apr 28, 2025

Uh oh!

Uh oh!

mshockwave commented Apr 28, 2025

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zqb-all commented May 7, 2025

Uh oh!

preames commented May 7, 2025

Uh oh!

zqb-all commented May 7, 2025

Uh oh!

mikhailramalho commented May 7, 2025

Uh oh!

Uh oh!

mikhailramalho commented Apr 25, 2025 •

edited

Loading

wangpc-pp commented Apr 25, 2025 •

edited

Loading