[NVPTX] Pull invariant load identification into IR pass #138015

AlexMaclean · 2025-04-30T19:02:15Z

Pull invariant load identification, which was previously part of DAGToDAG ISel, into a new IR pass NVPTXTagInvariantLoads. This makes it possible to disable this optimization at O0 and reduces the complexity of the SelectionDAG pass. Moving this logic to an IR pass also allows for implementing a more powerful traversal in the future.

Fixes #138138

llvmbot · 2025-04-30T19:02:46Z

@llvm/pr-subscribers-backend-nvptx

Author: Alex MacLean (AlexMaclean)

Changes

Pull invariant load identification, which was previously part of DAGToDAG ISel, into a new IR pass NVPTXTagInvariantLoads. This makes it possible to disable this optimization at O0 and reduces the complexity of the SelectionDAG pass. Moving this logic to an IR pass also allows for implementing a more powerful traversal in the future.

Full diff: https://github.com/llvm/llvm-project/pull/138015.diff

7 Files Affected:

(modified) llvm/lib/Target/NVPTX/CMakeLists.txt (+8-7)
(modified) llvm/lib/Target/NVPTX/NVPTX.h (+6)
(modified) llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp (+8-44)
(modified) llvm/lib/Target/NVPTX/NVPTXPassRegistry.def (+2-1)
(added) llvm/lib/Target/NVPTX/NVPTXTagInvariantLoads.cpp (+102)
(modified) llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp (+2)
(added) llvm/test/CodeGen/NVPTX/tag-invariant-loads.ll (+138)

diff --git a/llvm/lib/Target/NVPTX/CMakeLists.txt b/llvm/lib/Target/NVPTX/CMakeLists.txt
index 1cffde138eab7..693f0d0b35edc 100644
--- a/llvm/lib/Target/NVPTX/CMakeLists.txt
+++ b/llvm/lib/Target/NVPTX/CMakeLists.txt
@@ -13,34 +13,35 @@ add_public_tablegen_target(NVPTXCommonTableGen)
 set(NVPTXCodeGen_sources
   NVPTXAliasAnalysis.cpp
   NVPTXAllocaHoisting.cpp
-  NVPTXAtomicLower.cpp
   NVPTXAsmPrinter.cpp
   NVPTXAssignValidGlobalNames.cpp
+  NVPTXAtomicLower.cpp
+  NVPTXCtorDtorLowering.cpp
   NVPTXForwardParams.cpp
   NVPTXFrameLowering.cpp
   NVPTXGenericToNVVM.cpp
-  NVPTXISelDAGToDAG.cpp
-  NVPTXISelLowering.cpp
   NVPTXImageOptimizer.cpp
   NVPTXInstrInfo.cpp
+  NVPTXISelDAGToDAG.cpp
+  NVPTXISelLowering.cpp
   NVPTXLowerAggrCopies.cpp
-  NVPTXLowerArgs.cpp
   NVPTXLowerAlloca.cpp
+  NVPTXLowerArgs.cpp
   NVPTXLowerUnreachable.cpp
-  NVPTXPeephole.cpp
   NVPTXMCExpr.cpp
+  NVPTXPeephole.cpp
   NVPTXPrologEpilogPass.cpp
+  NVPTXProxyRegErasure.cpp
   NVPTXRegisterInfo.cpp
   NVPTXReplaceImageHandles.cpp
   NVPTXSelectionDAGInfo.cpp
   NVPTXSubtarget.cpp
+  NVPTXTagInvariantLoads.cpp
   NVPTXTargetMachine.cpp
   NVPTXTargetTransformInfo.cpp
   NVPTXUtilities.cpp
   NVVMIntrRange.cpp
   NVVMReflect.cpp
-  NVPTXProxyRegErasure.cpp
-  NVPTXCtorDtorLowering.cpp
   )
 
 add_llvm_target(NVPTXCodeGen
diff --git a/llvm/lib/Target/NVPTX/NVPTX.h b/llvm/lib/Target/NVPTX/NVPTX.h
index cf21ad991ccdf..1da979d023b42 100644
--- a/llvm/lib/Target/NVPTX/NVPTX.h
+++ b/llvm/lib/Target/NVPTX/NVPTX.h
@@ -51,6 +51,7 @@ FunctionPass *createNVPTXLowerArgsPass();
 FunctionPass *createNVPTXLowerAllocaPass();
 FunctionPass *createNVPTXLowerUnreachablePass(bool TrapUnreachable,
                                               bool NoTrapAfterNoreturn);
+FunctionPass *createNVPTXTagInvariantLoadsPass();
 MachineFunctionPass *createNVPTXPeephole();
 MachineFunctionPass *createNVPTXProxyRegErasurePass();
 MachineFunctionPass *createNVPTXForwardParamsPass();
@@ -73,6 +74,7 @@ void initializeNVVMReflectPass(PassRegistry &);
 void initializeNVPTXAAWrapperPassPass(PassRegistry &);
 void initializeNVPTXExternalAAWrapperPass(PassRegistry &);
 void initializeNVPTXPeepholePass(PassRegistry &);
+void initializeNVPTXTagInvariantLoadLegacyPassPass(PassRegistry &);
 
 struct NVVMIntrRangePass : PassInfoMixin<NVVMIntrRangePass> {
   PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
@@ -104,6 +106,10 @@ struct NVPTXLowerArgsPass : PassInfoMixin<NVPTXLowerArgsPass> {
   PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
 };
 
+struct NVPTXTagInvariantLoadsPass : PassInfoMixin<NVPTXTagInvariantLoadsPass> {
+  PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
+};
+
 namespace NVPTX {
 enum DrvInterface {
   NVCL,
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp b/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
index 295ed666a1902..0eb350bc4cc97 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
@@ -766,46 +766,12 @@ NVPTX::Scope NVPTXDAGToDAGISel::getOperationScope(MemSDNode *N,
   llvm_unreachable("unhandled ordering");
 }
 
-static bool canLowerToLDG(MemSDNode *N, const NVPTXSubtarget &Subtarget,
-                          unsigned CodeAddrSpace, MachineFunction *F) {
+static bool canLowerToLDG(const MemSDNode *N, const NVPTXSubtarget &Subtarget,
+                          unsigned CodeAddrSpace) {
   // We use ldg (i.e. ld.global.nc) for invariant loads from the global address
   // space.
-  //
-  // We have two ways of identifying invariant loads: Loads may be explicitly
-  // marked as invariant, or we may infer them to be invariant.
-  //
-  // We currently infer invariance for loads from
-  //  - constant global variables, and
-  //  - kernel function pointer params that are noalias (i.e. __restrict) and
-  //    never written to.
-  //
-  // TODO: Perform a more powerful invariance analysis (ideally IPO, and ideally
-  // not during the SelectionDAG phase).
-  //
-  // TODO: Infer invariance only at -O2.  We still want to use ldg at -O0 for
-  // explicitly invariant loads because these are how clang tells us to use ldg
-  // when the user uses a builtin.
-  if (!Subtarget.hasLDG() || CodeAddrSpace != NVPTX::AddressSpace::Global)
-    return false;
-
-  if (N->isInvariant())
-    return true;
-
-  bool IsKernelFn = isKernelFunction(F->getFunction());
-
-  // We use getUnderlyingObjects() here instead of getUnderlyingObject() mainly
-  // because the former looks through phi nodes while the latter does not. We
-  // need to look through phi nodes to handle pointer induction variables.
-  SmallVector<const Value *, 8> Objs;
-  getUnderlyingObjects(N->getMemOperand()->getValue(), Objs);
-
-  return all_of(Objs, [&](const Value *V) {
-    if (auto *A = dyn_cast<const Argument>(V))
-      return IsKernelFn && A->onlyReadsMemory() && A->hasNoAliasAttr();
-    if (auto *GV = dyn_cast<const GlobalVariable>(V))
-      return GV->isConstant();
-    return false;
-  });
+  return Subtarget.hasLDG() && CodeAddrSpace == NVPTX::AddressSpace::Global &&
+         N->isInvariant();
 }
 
 static unsigned int getFenceOp(NVPTX::Ordering O, NVPTX::Scope S,
@@ -1106,10 +1072,9 @@ bool NVPTXDAGToDAGISel::tryLoad(SDNode *N) {
     return false;
 
   // Address Space Setting
-  unsigned int CodeAddrSpace = getCodeAddrSpace(LD);
-  if (canLowerToLDG(LD, *Subtarget, CodeAddrSpace, MF)) {
+  const unsigned CodeAddrSpace = getCodeAddrSpace(LD);
+  if (canLowerToLDG(LD, *Subtarget, CodeAddrSpace))
     return tryLDGLDU(N);
-  }
 
   SDLoc DL(N);
   SDValue Chain = N->getOperand(0);
@@ -1192,10 +1157,9 @@ bool NVPTXDAGToDAGISel::tryLoadVector(SDNode *N) {
   const MVT MemVT = MemEVT.getSimpleVT();
 
   // Address Space Setting
-  unsigned int CodeAddrSpace = getCodeAddrSpace(MemSD);
-  if (canLowerToLDG(MemSD, *Subtarget, CodeAddrSpace, MF)) {
+  const unsigned CodeAddrSpace = getCodeAddrSpace(MemSD);
+  if (canLowerToLDG(MemSD, *Subtarget, CodeAddrSpace))
     return tryLDGLDU(N);
-  }
 
   EVT EltVT = N->getValueType(0);
   SDLoc DL(N);
diff --git a/llvm/lib/Target/NVPTX/NVPTXPassRegistry.def b/llvm/lib/Target/NVPTX/NVPTXPassRegistry.def
index 1c813c2c51f70..ee37c9826012c 100644
--- a/llvm/lib/Target/NVPTX/NVPTXPassRegistry.def
+++ b/llvm/lib/Target/NVPTX/NVPTXPassRegistry.def
@@ -38,5 +38,6 @@ FUNCTION_ALIAS_ANALYSIS("nvptx-aa", NVPTXAA())
 #endif
 FUNCTION_PASS("nvvm-intr-range", NVVMIntrRangePass())
 FUNCTION_PASS("nvptx-copy-byval-args", NVPTXCopyByValArgsPass())
-FUNCTION_PASS("nvptx-lower-args", NVPTXLowerArgsPass(*this));
+FUNCTION_PASS("nvptx-lower-args", NVPTXLowerArgsPass(*this))
+FUNCTION_PASS("nvptx-tag-invariant-loads", NVPTXTagInvariantLoadsPass())
 #undef FUNCTION_PASS
diff --git a/llvm/lib/Target/NVPTX/NVPTXTagInvariantLoads.cpp b/llvm/lib/Target/NVPTX/NVPTXTagInvariantLoads.cpp
new file mode 100644
index 0000000000000..997bb1880bdf0
--- /dev/null
+++ b/llvm/lib/Target/NVPTX/NVPTXTagInvariantLoads.cpp
@@ -0,0 +1,102 @@
+//===------ NVPTXTagInvariantLoads.cpp - Tag invariant loads --------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements invaraint load tagging. It traverses load instructions
+// in a function, and determines if each load can be tagged as invariant.
+//
+// We currently infer invariance for loads from
+//  - constant global variables, and
+//  - kernel function pointer params that are noalias (i.e. __restrict) and
+//    never written to.
+//
+// TODO: Perform a more powerful invariance analysis (ideally IPO).
+//
+//===----------------------------------------------------------------------===//
+
+#include "NVPTXUtilities.h"
+#include "llvm/Analysis/ValueTracking.h"
+#include "llvm/IR/InstIterator.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/Metadata.h"
+#include "llvm/Support/NVPTXAddrSpace.h"
+
+using namespace llvm;
+
+static void markLoadsAsInvariant(LoadInst *LI) {
+  LI->setMetadata(LLVMContext::MD_invariant_load,
+                  MDNode::get(LI->getContext(), {}));
+}
+
+static bool tagInvariantLoads(Function &F) {
+  const bool IsKernelFn = isKernelFunction(F);
+
+  bool Changed = false;
+  for (auto &I : instructions(F)) {
+    if (auto *LI = dyn_cast<LoadInst>(&I)) {
+
+      // Don't bother with non-global loads
+      if (LI->getPointerAddressSpace() != NVPTXAS::ADDRESS_SPACE_GLOBAL)
+        continue;
+
+      if (LI->getMetadata(LLVMContext::MD_invariant_load))
+        continue;
+
+      SmallVector<const Value *, 8> Objs;
+
+      // We use getUnderlyingObjects() here instead of getUnderlyingObject()
+      // mainly because the former looks through phi nodes while the latter does
+      // not. We need to look through phi nodes to handle pointer induction
+      // variables.
+
+      getUnderlyingObjects(LI->getPointerOperand(), Objs);
+
+      const bool IsInvariant = all_of(Objs, [&](const Value *V) {
+        if (const auto *A = dyn_cast<const Argument>(V))
+          return IsKernelFn && A->onlyReadsMemory() && A->hasNoAliasAttr();
+        if (const auto *GV = dyn_cast<const GlobalVariable>(V))
+          return GV->isConstant();
+        return false;
+      });
+
+      if (IsInvariant) {
+        markLoadsAsInvariant(LI);
+        Changed = true;
+      }
+    }
+  }
+
+  return Changed;
+}
+
+namespace {
+
+struct NVPTXTagInvariantLoadLegacyPass : public FunctionPass {
+  static char ID;
+
+  NVPTXTagInvariantLoadLegacyPass() : FunctionPass(ID) {}
+  bool runOnFunction(Function &F) override;
+};
+
+} // namespace
+
+INITIALIZE_PASS(NVPTXTagInvariantLoadLegacyPass, "nvptx-tag-invariant-loads",
+                "NVPTX Tag Invariant Loads", false, false)
+
+bool NVPTXTagInvariantLoadLegacyPass::runOnFunction(Function &F) {
+  return tagInvariantLoads(F);
+}
+
+char NVPTXTagInvariantLoadLegacyPass::ID = 0;
+
+FunctionPass *llvm::createNVPTXTagInvariantLoadsPass() {
+  return new NVPTXTagInvariantLoadLegacyPass();
+}
+
+PreservedAnalyses NVPTXTagInvariantLoadsPass::run(Function &F, FunctionAnalysisManager &) {
+  return tagInvariantLoads(F) ? PreservedAnalyses::none() : PreservedAnalyses::all();
+}
diff --git a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
index f78d4585bbe98..dc3afc1f4a17d 100644
--- a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
@@ -112,6 +112,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeNVPTXTarget() {
   initializeNVPTXAAWrapperPassPass(PR);
   initializeNVPTXExternalAAWrapperPass(PR);
   initializeNVPTXPeepholePass(PR);
+  initializeNVPTXTagInvariantLoadLegacyPassPass(PR);
 }
 
 static std::string computeDataLayout(bool is64Bit, bool UseShortPointers) {
@@ -395,6 +396,7 @@ void NVPTXPassConfig::addIRPasses() {
     if (!DisableLoadStoreVectorizer)
       addPass(createLoadStoreVectorizerPass());
     addPass(createSROAPass());
+    addPass(createNVPTXTagInvariantLoadsPass());
   }
 
   if (ST.hasPTXASUnreachableBug()) {
diff --git a/llvm/test/CodeGen/NVPTX/tag-invariant-loads.ll b/llvm/test/CodeGen/NVPTX/tag-invariant-loads.ll
new file mode 100644
index 0000000000000..26967faa01a1b
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/tag-invariant-loads.ll
@@ -0,0 +1,138 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -passes=nvptx-tag-invariant-loads < %s -mcpu=sm_80 | FileCheck %s --check-prefix=OPT
+; RUN: llc -o - < %s -mcpu=sm_80 | FileCheck %s --check-prefix=PTX
+
+target triple = "nvptx-unknown-cuda"
+
+define ptx_kernel void @basic(ptr noalias readonly %a, ptr %out) {
+; OPT-LABEL: define ptx_kernel void @basic(
+; OPT-SAME: ptr noalias readonly [[A:%.*]], ptr [[OUT:%.*]]) #[[ATTR0:[0-9]+]] {
+; OPT-NEXT:    [[A_GLOBAL:%.*]] = addrspacecast ptr [[A]] to ptr addrspace(1)
+; OPT-NEXT:    [[VAL:%.*]] = load float, ptr addrspace(1) [[A_GLOBAL]], align 4, !invariant.load [[META0:![0-9]+]]
+; OPT-NEXT:    store float [[VAL]], ptr [[OUT]], align 4
+; OPT-NEXT:    ret void
+;
+; PTX-LABEL: basic(
+; PTX:       {
+; PTX-NEXT:    .reg .b32 %r<5>;
+; PTX-NEXT:    .reg .b32 %f<2>;
+; PTX-EMPTY:
+; PTX-NEXT:  // %bb.0:
+; PTX-NEXT:    ld.param.u32 %r1, [basic_param_0];
+; PTX-NEXT:    cvta.to.global.u32 %r2, %r1;
+; PTX-NEXT:    ld.param.u32 %r3, [basic_param_1];
+; PTX-NEXT:    cvta.to.global.u32 %r4, %r3;
+; PTX-NEXT:    ld.global.nc.f32 %f1, [%r2];
+; PTX-NEXT:    st.global.f32 [%r4], %f1;
+; PTX-NEXT:    ret;
+  %a_global = addrspacecast ptr %a to ptr addrspace(1)
+  %val = load float, ptr addrspace(1) %a_global
+  store float %val, ptr %out
+  ret void
+}
+
+define ptx_kernel void @select(ptr noalias readonly %a, ptr noalias readonly %b, i1 %c, ptr %out) {
+; OPT-LABEL: define ptx_kernel void @select(
+; OPT-SAME: ptr noalias readonly [[A:%.*]], ptr noalias readonly [[B:%.*]], i1 [[C:%.*]], ptr [[OUT:%.*]]) #[[ATTR0]] {
+; OPT-NEXT:    [[SELECT:%.*]] = select i1 [[C]], ptr [[A]], ptr [[B]]
+; OPT-NEXT:    [[SELECT_GLOBAL:%.*]] = addrspacecast ptr [[SELECT]] to ptr addrspace(1)
+; OPT-NEXT:    [[VAL:%.*]] = load i32, ptr addrspace(1) [[SELECT_GLOBAL]], align 4, !invariant.load [[META0]]
+; OPT-NEXT:    store i32 [[VAL]], ptr [[OUT]], align 4
+; OPT-NEXT:    ret void
+;
+; PTX-LABEL: select(
+; PTX:       {
+; PTX-NEXT:    .reg .pred %p<2>;
+; PTX-NEXT:    .reg .b16 %rs<3>;
+; PTX-NEXT:    .reg .b32 %r<9>;
+; PTX-EMPTY:
+; PTX-NEXT:  // %bb.0:
+; PTX-NEXT:    ld.param.u8 %rs1, [select_param_2];
+; PTX-NEXT:    and.b16 %rs2, %rs1, 1;
+; PTX-NEXT:    setp.ne.b16 %p1, %rs2, 0;
+; PTX-NEXT:    ld.param.u32 %r1, [select_param_0];
+; PTX-NEXT:    cvta.to.global.u32 %r2, %r1;
+; PTX-NEXT:    ld.param.u32 %r3, [select_param_1];
+; PTX-NEXT:    cvta.to.global.u32 %r4, %r3;
+; PTX-NEXT:    ld.param.u32 %r5, [select_param_3];
+; PTX-NEXT:    cvta.to.global.u32 %r6, %r5;
+; PTX-NEXT:    selp.b32 %r7, %r2, %r4, %p1;
+; PTX-NEXT:    ld.global.nc.u32 %r8, [%r7];
+; PTX-NEXT:    st.global.u32 [%r6], %r8;
+; PTX-NEXT:    ret;
+  %select = select i1 %c, ptr %a, ptr %b
+  %select_global = addrspacecast ptr %select to ptr addrspace(1)
+  %val = load i32, ptr addrspace(1) %select_global
+  store i32 %val, ptr %out
+  ret void
+}
+
+define void @not_kernel(ptr noalias readonly %a, ptr %out) {
+; OPT-LABEL: define void @not_kernel(
+; OPT-SAME: ptr noalias readonly [[A:%.*]], ptr [[OUT:%.*]]) #[[ATTR0]] {
+; OPT-NEXT:    [[A_GLOBAL:%.*]] = addrspacecast ptr [[A]] to ptr addrspace(1)
+; OPT-NEXT:    [[VAL:%.*]] = load float, ptr addrspace(1) [[A_GLOBAL]], align 4
+; OPT-NEXT:    store float [[VAL]], ptr [[OUT]], align 4
+; OPT-NEXT:    ret void
+;
+; PTX-LABEL: not_kernel(
+; PTX:       {
+; PTX-NEXT:    .reg .b32 %r<4>;
+; PTX-NEXT:    .reg .b32 %f<2>;
+; PTX-EMPTY:
+; PTX-NEXT:  // %bb.0:
+; PTX-NEXT:    ld.param.u32 %r1, [not_kernel_param_0];
+; PTX-NEXT:    cvta.to.global.u32 %r2, %r1;
+; PTX-NEXT:    ld.param.u32 %r3, [not_kernel_param_1];
+; PTX-NEXT:    ld.global.f32 %f1, [%r2];
+; PTX-NEXT:    st.f32 [%r3], %f1;
+; PTX-NEXT:    ret;
+  %a_global = addrspacecast ptr %a to ptr addrspace(1)
+  %val = load float, ptr addrspace(1) %a_global
+  store float %val, ptr %out
+  ret void
+}
+
+%struct.S2 = type { i64, i64 }
+@G = private unnamed_addr constant %struct.S2 { i64 1, i64 1 }, align 8
+
+define ptx_kernel void @global_load(ptr noalias readonly %a, i1 %c, ptr %out) {
+; OPT-LABEL: define ptx_kernel void @global_load(
+; OPT-SAME: ptr noalias readonly [[A:%.*]], i1 [[C:%.*]], ptr [[OUT:%.*]]) #[[ATTR0]] {
+; OPT-NEXT:    [[G_GLOBAL:%.*]] = addrspacecast ptr @G to ptr addrspace(1)
+; OPT-NEXT:    [[A_GLOBAL:%.*]] = addrspacecast ptr [[A]] to ptr addrspace(1)
+; OPT-NEXT:    [[SELECT:%.*]] = select i1 [[C]], ptr addrspace(1) [[G_GLOBAL]], ptr addrspace(1) [[A_GLOBAL]]
+; OPT-NEXT:    [[VAL:%.*]] = load i64, ptr addrspace(1) [[SELECT]], align 8, !invariant.load [[META0]]
+; OPT-NEXT:    store i64 [[VAL]], ptr [[OUT]], align 8
+; OPT-NEXT:    ret void
+;
+; PTX-LABEL: global_load(
+; PTX:       {
+; PTX-NEXT:    .reg .pred %p<2>;
+; PTX-NEXT:    .reg .b16 %rs<3>;
+; PTX-NEXT:    .reg .b32 %r<7>;
+; PTX-NEXT:    .reg .b64 %rd<2>;
+; PTX-EMPTY:
+; PTX-NEXT:  // %bb.0:
+; PTX-NEXT:    ld.param.u8 %rs1, [global_load_param_1];
+; PTX-NEXT:    and.b16 %rs2, %rs1, 1;
+; PTX-NEXT:    setp.ne.b16 %p1, %rs2, 0;
+; PTX-NEXT:    ld.param.u32 %r1, [global_load_param_0];
+; PTX-NEXT:    cvta.to.global.u32 %r2, %r1;
+; PTX-NEXT:    ld.param.u32 %r3, [global_load_param_2];
+; PTX-NEXT:    cvta.to.global.u32 %r4, %r3;
+; PTX-NEXT:    mov.b32 %r5, G;
+; PTX-NEXT:    selp.b32 %r6, %r5, %r2, %p1;
+; PTX-NEXT:    ld.global.nc.u64 %rd1, [%r6];
+; PTX-NEXT:    st.global.u64 [%r4], %rd1;
+; PTX-NEXT:    ret;
+  %g_global = addrspacecast ptr @G to ptr addrspace(1)
+  %a_global = addrspacecast ptr %a to ptr addrspace(1)
+  %select = select i1 %c, ptr addrspace(1) %g_global, ptr addrspace(1) %a_global
+  %val = load i64, ptr addrspace(1) %select
+  store i64 %val, ptr %out
+  ret void
+}
+;.
+; OPT: [[META0]] = !{}
+;.

github-actions · 2025-04-30T19:04:38Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Artem-B

LGTM

Artem-B · 2025-04-30T21:51:41Z

llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp

@@ -766,46 +766,12 @@ NVPTX::Scope NVPTXDAGToDAGISel::getOperationScope(MemSDNode *N,
  llvm_unreachable("unhandled ordering");
 }

-static bool canLowerToLDG(MemSDNode *N, const NVPTXSubtarget &Subtarget,
-                          unsigned CodeAddrSpace, MachineFunction *F) {
+static bool canLowerToLDG(const MemSDNode *N, const NVPTXSubtarget &Subtarget,


May as well make N a constref.

Artem-B · 2025-04-30T21:53:06Z

llvm/lib/Target/NVPTX/NVPTXTagInvariantLoads.cpp

+//  - kernel function pointer params that are noalias (i.e. __restrict) and
+//    never written to.


We can also consider grid constant arguments.

jhuber6 · 2025-05-01T15:22:14Z

llvm/test/CodeGen/NVPTX/byval-const-global.ll

@@ -0,0 +1,33 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mcpu=sm_20 | FileCheck %s


FYI, the original issue only reproduced at higher sm values https://godbolt.org/z/654zYKo4o. So I'd recommend increasing this.

llvm-ci · 2025-05-01T17:51:59Z

LLVM Buildbot has detected a new failure on builder lldb-arm-ubuntu running on linaro-lldb-arm-ubuntu while building llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/18/builds/15354

Here is the relevant piece of the build log for the reference

Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: tools/lldb-dap/instruction-breakpoint/TestDAP_instruction_breakpoint.py (1177 of 3009)
PASS: lldb-api :: tools/lldb-dap/io/TestDAP_io.py (1178 of 3009)
PASS: lldb-api :: tools/lldb-dap/disconnect/TestDAP_disconnect.py (1179 of 3009)
UNSUPPORTED: lldb-api :: tools/lldb-dap/memory/TestDAP_memory.py (1180 of 3009)
PASS: lldb-api :: tools/lldb-dap/locations/TestDAP_locations.py (1181 of 3009)
PASS: lldb-api :: tools/lldb-dap/optimized/TestDAP_optimized.py (1182 of 3009)
PASS: lldb-api :: terminal/TestEditlineCompletions.py (1183 of 3009)
PASS: lldb-api :: tools/lldb-dap/output/TestDAP_output.py (1184 of 3009)
PASS: lldb-api :: tools/lldb-dap/repl-mode/TestDAP_repl_mode_detection.py (1185 of 3009)
UNRESOLVED: lldb-api :: tools/lldb-dap/launch/TestDAP_launch.py (1186 of 3009)
******************** TEST 'lldb-api :: tools/lldb-dap/launch/TestDAP_launch.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --arch armv8l --build-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/tools/lldb-dap/launch -p TestDAP_launch.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 21.0.0git (https://github.com/llvm/llvm-project.git revision a88d580860b88bbb02797bae95032b6eb0c4579c)
  clang revision a88d580860b88bbb02797bae95032b6eb0c4579c
  llvm revision a88d580860b88bbb02797bae95032b6eb0c4579c
Skipping the following test categories: ['libc++', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
========= DEBUG ADAPTER PROTOCOL LOGS =========
1746121664.214599371 --> (stdin/stdout) {"command":"initialize","type":"request","arguments":{"adapterID":"lldb-native","clientID":"vscode","columnsStartAt1":true,"linesStartAt1":true,"locale":"en-us","pathFormat":"path","supportsRunInTerminalRequest":true,"supportsVariablePaging":true,"supportsVariableType":true,"supportsStartDebuggingRequest":true,"supportsProgressReporting":true,"$__lldb_sourceInitFile":false},"seq":1}
1746121664.218562126 <-- (stdin/stdout) {"body":{"$__lldb_version":"lldb version 21.0.0git (https://github.com/llvm/llvm-project.git revision a88d580860b88bbb02797bae95032b6eb0c4579c)\n  clang revision a88d580860b88bbb02797bae95032b6eb0c4579c\n  llvm revision a88d580860b88bbb02797bae95032b6eb0c4579c","completionTriggerCharacters":["."," ","\t"],"exceptionBreakpointFilters":[{"default":false,"filter":"cpp_catch","label":"C++ Catch"},{"default":false,"filter":"cpp_throw","label":"C++ Throw"},{"default":false,"filter":"objc_catch","label":"Objective-C Catch"},{"default":false,"filter":"objc_throw","label":"Objective-C Throw"}],"supportTerminateDebuggee":true,"supportsBreakpointLocationsRequest":true,"supportsCancelRequest":true,"supportsCompletionsRequest":true,"supportsConditionalBreakpoints":true,"supportsConfigurationDoneRequest":true,"supportsDataBreakpoints":true,"supportsDelayedStackTraceLoading":true,"supportsDisassembleRequest":true,"supportsEvaluateForHovers":true,"supportsExceptionInfoRequest":true,"supportsExceptionOptions":true,"supportsFunctionBreakpoints":true,"supportsHitConditionalBreakpoints":true,"supportsInstructionBreakpoints":true,"supportsLogPoints":true,"supportsModulesRequest":true,"supportsReadMemoryRequest":true,"supportsRestartRequest":true,"supportsSetVariable":true,"supportsStepInTargetsRequest":true,"supportsSteppingGranularity":true,"supportsValueFormattingOptions":true},"command":"initialize","request_seq":1,"seq":0,"success":true,"type":"response"}
1746121664.219090223 --> (stdin/stdout) {"command":"launch","type":"request","arguments":{"program":"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/tools/lldb-dap/launch/TestDAP_launch.test_args/a.out","args":["one","with space","'with single quotes'","\"with double quotes\""],"initCommands":["settings clear --all","settings set symbols.enable-external-lookup false","settings set target.inherit-tcc true","settings set target.disable-aslr false","settings set target.detach-on-error false","settings set target.auto-apply-fixits false","settings set plugin.process.gdb-remote.packet-timeout 60","settings set symbols.clang-modules-cache-path \"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api\"","settings set use-color false","settings set show-statusline false"],"disableASLR":false,"enableAutoVariableSummaries":false,"enableSyntheticChildDebugging":false,"displayExtendedBacktrace":false},"seq":2}
1746121664.219664574 <-- (stdin/stdout) {"body":{"category":"console","output":"Running initCommands:\n"},"event":"output","seq":0,"type":"event"}
1746121664.219730139 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings clear --all\n"},"event":"output","seq":0,"type":"event"}
1746121664.219744921 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set symbols.enable-external-lookup false\n"},"event":"output","seq":0,"type":"event"}
1746121664.219756842 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set target.inherit-tcc true\n"},"event":"output","seq":0,"type":"event"}
1746121664.219768524 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set target.disable-aslr false\n"},"event":"output","seq":0,"type":"event"}
1746121664.219779491 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set target.detach-on-error false\n"},"event":"output","seq":0,"type":"event"}
1746121664.219790697 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set target.auto-apply-fixits false\n"},"event":"output","seq":0,"type":"event"}
1746121664.219801903 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set plugin.process.gdb-remote.packet-timeout 60\n"},"event":"output","seq":0,"type":"event"}
1746121664.219853163 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set symbols.clang-modules-cache-path \"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api\"\n"},"event":"output","seq":0,"type":"event"}
1746121664.219864607 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set use-color false\n"},"event":"output","seq":0,"type":"event"}
1746121664.219875813 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set show-statusline false\n"},"event":"output","seq":0,"type":"event"}
1746121664.388539314 <-- (stdin/stdout) {"command":"launch","request_seq":2,"seq":0,"success":true,"type":"response"}
1746121664.388684273 <-- (stdin/stdout) {"body":{"module":{"addressRange":"4156817408","debugInfoSize":"983.3KB","id":"0D794E6C-AF7E-D8CB-B9BA-E385B4F8753F-5A793D65","name":"ld-linux-armhf.so.3","path":"/usr/lib/arm-linux-gnueabihf/ld-linux-armhf.so.3","symbolFilePath":"/usr/lib/arm-linux-gnueabihf/ld-linux-armhf.so.3","symbolStatus":"Symbols loaded."},"reason":"new"},"event":"module","seq":0,"type":"event"}
1746121664.388838291 <-- (stdin/stdout) {"body":{"isLocalProcess":true,"name":"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/tools/lldb-dap/launch/TestDAP_launch.test_args/a.out","startMethod":"launch","systemProcessId":4006802},"event":"process","seq":0,"type":"event"}
1746121664.388894320 <-- (stdin/stdout) {"event":"initialized","seq":0,"type":"event"}
1746121664.388998032 <-- (stdin/stdout) {"body":{"module":{"addressRange":"8519680","debugInfoSize":"1.4KB","id":"1EF971EC","name":"a.out","path":"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/tools/lldb-dap/launch/TestDAP_launch.test_args/a.out","symbolFilePath":"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/tools/lldb-dap/launch/TestDAP_launch.test_args/a.out","symbolStatus":"Symbols loaded."},"reason":"new"},"event":"module","seq":0,"type":"event"}
1746121664.389397144 --> (stdin/stdout) {"command":"configurationDone","type":"request","arguments":{},"seq":3}
1746121664.389551878 <-- (stdin/stdout) {"command":"configurationDone","request_seq":3,"seq":0,"success":true,"type":"response"}

fmayer · 2025-05-01T19:40:55Z

FYI (NO ACTION REQUIRED IF YOU DON'T WANT TO) this broke the GN build

ld.lld: error: undefined symbol: llvm::initializeNVPTXTagInvariantLoadLegacyPassPass(llvm::PassRegistry&)
>>> referenced by NVPTXTargetMachine.cpp
>>>               ../obj/llvm/lib/Target/NVPTX/LLVMNVPTXCodeGen.NVPTXTargetMachine.o:(LLVMInitializeNVPTXTarget) in archive lib/libLLVMNVPTXCod
eGen.a                      
                                                                       
ld.lld: error: undefined symbol: llvm::createNVPTXTagInvariantLoadsPass()
>>> referenced by NVPTXTargetMachine.cpp                                                                                                       
>>>               ../obj/llvm/lib/Target/NVPTX/LLVMNVPTXCodeGen.NVPTXTargetMachine.o:((anonymous namespace)::NVPTXPassConfig::addIRPasses()) in archive lib/libLLVMNVPTXCodeGen.a

ld.lld: error: undefined symbol: llvm::NVPTXTagInvariantLoadsPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)
>>> referenced by NVPTXTargetMachine.cpp
>>>               ../obj/llvm/lib/Target/NVPTX/LLVMNVPTXCodeGen.NVPTXTargetMachine.o:(llvm::detail::PassModel<llvm::Function, llvm::NVPTXTagInv
ariantLoadsPass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)) in archive lib/libLLVMNV
PTXCodeGen.a
clang++: error: linker command failed with exit code 1 (use -v to see invocation)

jhuber6 · 2025-05-01T19:44:16Z

FYI this broke the GN build

ld.lld: error: undefined symbol: llvm::initializeNVPTXTagInvariantLoadLegacyPassPass(llvm::PassRegistry&)
>>> referenced by NVPTXTargetMachine.cpp
>>>               ../obj/llvm/lib/Target/NVPTX/LLVMNVPTXCodeGen.NVPTXTargetMachine.o:(LLVMInitializeNVPTXTarget) in archive lib/libLLVMNVPTXCod
eGen.a                      
                                                                       
ld.lld: error: undefined symbol: llvm::createNVPTXTagInvariantLoadsPass()
>>> referenced by NVPTXTargetMachine.cpp                                                                                                       
>>>               ../obj/llvm/lib/Target/NVPTX/LLVMNVPTXCodeGen.NVPTXTargetMachine.o:((anonymous namespace)::NVPTXPassConfig::addIRPasses()) in archive lib/libLLVMNVPTXCodeGen.a

ld.lld: error: undefined symbol: llvm::NVPTXTagInvariantLoadsPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)
>>> referenced by NVPTXTargetMachine.cpp
>>>               ../obj/llvm/lib/Target/NVPTX/LLVMNVPTXCodeGen.NVPTXTargetMachine.o:(llvm::detail::PassModel<llvm::Function, llvm::NVPTXTagInv
ariantLoadsPass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)) in archive lib/libLLVMNV
PTXCodeGen.a
clang++: error: linker command failed with exit code 1 (use -v to see invocation)

Missing dependency for a static build maybe?

AlexMaclean · 2025-05-01T20:02:40Z

@fmayer sorry about that! I'm very unfamiliar with the GN build. I suppose that this file may need to be updated to add the new source file? It looks like a bot normally does this automatically?

llvm-project/llvm/utils/gn/secondary/llvm/lib/Target/NVPTX/BUILD.gn

Lines 30 to 34 in b2e2ae8

    
           sources = [ 
        
             "NVPTXAliasAnalysis.cpp", 
        
             "NVPTXAllocaHoisting.cpp", 
        
             "NVPTXAsmPrinter.cpp", 
        
             "NVPTXAssignValidGlobalNames.cpp",

llvm-ci · 2025-05-02T11:03:27Z

LLVM Buildbot has detected a new failure on builder lld-x86_64-win running on as-worker-93 while building llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/146/builds/2829

Here is the relevant piece of the build log for the reference

Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM-Unit :: Support/./SupportTests.exe/90/95' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe-LLVM-Unit-21004-90-95.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=95 GTEST_SHARD_INDEX=90 C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe
--

Script:
--
C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe --gtest_filter=ProgramEnvTest.CreateProcessLongPath
--
C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(160): error: Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(163): error: fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied



C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:160
Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:163
fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied




********************

Pull invariant load identification, which was previously part of DAGToDAG ISel, into a new IR pass NVPTXTagInvariantLoads. This makes it possible to disable this optimization at O0 and reduces the complexity of the SelectionDAG pass. Moving this logic to an IR pass also allows for implementing a more powerful traversal in the future. Fixes llvm#138138

AlexMaclean requested review from Artem-B, akshayrdeodhar and justinfargnoli April 30, 2025 19:02

AlexMaclean self-assigned this Apr 30, 2025

llvmbot added the backend:NVPTX label Apr 30, 2025

[NVPTX] Pull invariant load identification into IR pass

af03c15

AlexMaclean force-pushed the dev/amaclean/upstream/ldg-xform branch from 94a02b2 to af03c15 Compare April 30, 2025 19:06

Artem-B approved these changes Apr 30, 2025

View reviewed changes

address comments

7fd4f03

AlexMaclean mentioned this pull request May 1, 2025

[NVPTX] Backend crash on byval reference to external constant #138138

Closed

add test for llvm#138138

0469aae

jhuber6 reviewed May 1, 2025

View reviewed changes

address comments

2994569

jhuber6 approved these changes May 1, 2025

View reviewed changes

AlexMaclean merged commit a88d580 into llvm:main May 1, 2025
11 checks passed

AlexMaclean mentioned this pull request May 5, 2025

[NVPTX] Improve kernel byval parameter lowering #136008

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NVPTX] Pull invariant load identification into IR pass #138015

[NVPTX] Pull invariant load identification into IR pass #138015

Uh oh!

AlexMaclean commented Apr 30, 2025 •

edited by jhuber6

Loading

Uh oh!

llvmbot commented Apr 30, 2025

Uh oh!

github-actions bot commented Apr 30, 2025 •

edited

Loading

Uh oh!

Artem-B left a comment

Uh oh!

Artem-B Apr 30, 2025

Uh oh!

Artem-B Apr 30, 2025

Uh oh!

jhuber6 May 1, 2025

Uh oh!

AlexMaclean May 1, 2025

Uh oh!

Uh oh!

llvm-ci commented May 1, 2025

Uh oh!

fmayer commented May 1, 2025 •

edited

Loading

Uh oh!

jhuber6 commented May 1, 2025

Uh oh!

AlexMaclean commented May 1, 2025

Uh oh!

llvm-ci commented May 2, 2025

Uh oh!

Uh oh!

		// - kernel function pointer params that are noalias (i.e. __restrict) and
		// never written to.

		@@ -0,0 +1,33 @@
		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
		; RUN: llc < %s -mcpu=sm_20 \| FileCheck %s

[NVPTX] Pull invariant load identification into IR pass #138015

[NVPTX] Pull invariant load identification into IR pass #138015

Uh oh!

Conversation

AlexMaclean commented Apr 30, 2025 • edited by jhuber6 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Apr 30, 2025

Uh oh!

github-actions bot commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Artem-B left a comment

Choose a reason for hiding this comment

Uh oh!

Artem-B Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

Artem-B Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

jhuber6 May 1, 2025

Choose a reason for hiding this comment

Uh oh!

AlexMaclean May 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented May 1, 2025

Uh oh!

fmayer commented May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhuber6 commented May 1, 2025

Uh oh!

AlexMaclean commented May 1, 2025

Uh oh!

llvm-ci commented May 2, 2025

Uh oh!

Uh oh!

AlexMaclean commented Apr 30, 2025 •

edited by jhuber6

Loading

github-actions bot commented Apr 30, 2025 •

edited

Loading

fmayer commented May 1, 2025 •

edited

Loading