[NVPTX] Add errors for incorrect CUDA addrpaces #138706

LewisCrawford · 2025-05-06T15:18:43Z

The CUDA API only accepts kernel params in the global and generic address spaces, so display an error message when attempting to emit pointers outside those address-spaces from CUDA (but still allow them for OpenCL).

llvmbot · 2025-05-06T15:19:27Z

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-nvptx

Author: Lewis Crawford (LewisCrawford)

Changes

The CUDA API only accepts kernel params in the global and generic address spaces, so display an error message when attempting to emit pointers outside those address-spaces from CUDA (but still allow them for OpenCL).

Full diff: https://github.com/llvm/llvm-project/pull/138706.diff

5 Files Affected:

(modified) llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp (+8)
(modified) llvm/test/CodeGen/NVPTX/kernel-param-align.ll (+3-2)
(added) llvm/test/CodeGen/NVPTX/lower-args-cuda.ll (+13)
(added) llvm/test/CodeGen/NVPTX/lower-args-nvcl.ll (+17)
(modified) llvm/test/CodeGen/NVPTX/lower-args.ll (-23)

diff --git a/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp b/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
index 2f4b109e8e9e9..9e17adb6ac1ae 100644
--- a/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
@@ -1399,6 +1399,8 @@ void NVPTXAsmPrinter::emitFunctionParamList(const Function *F, raw_ostream &O) {
       if (PTy) {
         O << "\t.param .u" << PTySizeInBits << " .ptr";
 
+        bool IsCUDA = static_cast<NVPTXTargetMachine &>(TM).getDrvInterface() ==
+                      NVPTX::CUDA;
         switch (PTy->getAddressSpace()) {
         default:
           break;
@@ -1406,12 +1408,18 @@ void NVPTXAsmPrinter::emitFunctionParamList(const Function *F, raw_ostream &O) {
           O << " .global";
           break;
         case ADDRESS_SPACE_SHARED:
+          if (IsCUDA)
+            report_fatal_error(".shared ptr kernel args unsupported in CUDA.");
           O << " .shared";
           break;
         case ADDRESS_SPACE_CONST:
+          if (IsCUDA)
+            report_fatal_error(".const ptr kernel args unsupported in CUDA.");
           O << " .const";
           break;
         case ADDRESS_SPACE_LOCAL:
+          if (IsCUDA)
+            report_fatal_error(".local ptr kernel args unsupported in CUDA.");
           O << " .local";
           break;
         }
diff --git a/llvm/test/CodeGen/NVPTX/kernel-param-align.ll b/llvm/test/CodeGen/NVPTX/kernel-param-align.ll
index a56b85de80143..e85ccf34bb6ac 100644
--- a/llvm/test/CodeGen/NVPTX/kernel-param-align.ll
+++ b/llvm/test/CodeGen/NVPTX/kernel-param-align.ll
@@ -1,5 +1,6 @@
-; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_60 | FileCheck %s
-; RUN: %if ptxas %{ llc < %s -mtriple=nvptx64 -mcpu=sm_60 | %ptxas -arch=sm_60 - %}
+; RUN: llc < %s -mcpu=sm_60 | FileCheck %s
+; RUN: %if ptxas %{ llc < %s -mcpu=sm_60 | %ptxas -arch=sm_60 - %}
+target triple = "nvptx64-nvidia-nvcl"
 
 %struct.Large = type { [16 x double] }
 
diff --git a/llvm/test/CodeGen/NVPTX/lower-args-cuda.ll b/llvm/test/CodeGen/NVPTX/lower-args-cuda.ll
new file mode 100644
index 0000000000000..7361ab28badb9
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/lower-args-cuda.ll
@@ -0,0 +1,13 @@
+; RUN: not --crash llc < %s -mcpu=sm_75  -o /dev/null 2>&1 | FileCheck %s
+
+target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
+target triple = "nvptx64-nvidia-cuda"
+
+; Make sure we exit with an error message for this input, as pointers to the
+; shared address-space are only supported as kernel args in NVCL, not CUDA.
+; CHECK:  .shared ptr kernel args unsupported in CUDA.
+define ptx_kernel void @ptr_nongeneric(ptr addrspace(1) %out, ptr addrspace(3) %in) {
+  %v = load i32, ptr addrspace(3) %in, align 4
+  store i32 %v, ptr addrspace(1) %out, align 4
+  ret void
+}
diff --git a/llvm/test/CodeGen/NVPTX/lower-args-nvcl.ll b/llvm/test/CodeGen/NVPTX/lower-args-nvcl.ll
new file mode 100644
index 0000000000000..44b44e0c17626
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/lower-args-nvcl.ll
@@ -0,0 +1,17 @@
+; RUN: opt < %s -S -nvptx-lower-args | FileCheck %s --check-prefixes COMMON,IR
+; RUN: llc < %s -mcpu=sm_20 | FileCheck %s --check-prefixes COMMON,PTX
+; RUN: %if ptxas %{ llc < %s -mcpu=sm_20 | %ptxas-verify %}
+
+target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
+target triple = "nvptx64-nvidia-nvcl"
+
+; COMMON-LABEL: ptr_nongeneric
+define ptx_kernel void @ptr_nongeneric(ptr addrspace(1) %out, ptr addrspace(3) %in) {
+; IR-NOT: addrspacecast
+; PTX-NOT: cvta.to.global
+; PTX:  ld.shared.u32
+; PTX   st.global.u32
+  %v = load i32, ptr addrspace(3) %in, align 4
+  store i32 %v, ptr addrspace(1) %out, align 4
+  ret void
+}
diff --git a/llvm/test/CodeGen/NVPTX/lower-args.ll b/llvm/test/CodeGen/NVPTX/lower-args.ll
index 8e879871e295b..44445a17d1eb3 100644
--- a/llvm/test/CodeGen/NVPTX/lower-args.ll
+++ b/llvm/test/CodeGen/NVPTX/lower-args.ll
@@ -140,29 +140,6 @@ define ptx_kernel void @ptr_generic(ptr %out, ptr %in) {
   ret void
 }
 
-define ptx_kernel void @ptr_nongeneric(ptr addrspace(1) %out, ptr addrspace(3) %in) {
-; IR-LABEL: define ptx_kernel void @ptr_nongeneric(
-; IR-SAME: ptr addrspace(1) [[OUT:%.*]], ptr addrspace(3) [[IN:%.*]]) {
-; IR-NEXT:    [[V:%.*]] = load i32, ptr addrspace(3) [[IN]], align 4
-; IR-NEXT:    store i32 [[V]], ptr addrspace(1) [[OUT]], align 4
-; IR-NEXT:    ret void
-;
-; PTX-LABEL: ptr_nongeneric(
-; PTX:       {
-; PTX-NEXT:    .reg .b32 %r<2>;
-; PTX-NEXT:    .reg .b64 %rd<3>;
-; PTX-EMPTY:
-; PTX-NEXT:  // %bb.0:
-; PTX-NEXT:    ld.param.u64 %rd1, [ptr_nongeneric_param_0];
-; PTX-NEXT:    ld.param.u64 %rd2, [ptr_nongeneric_param_1];
-; PTX-NEXT:    ld.shared.u32 %r1, [%rd2];
-; PTX-NEXT:    st.global.u32 [%rd1], %r1;
-; PTX-NEXT:    ret;
-  %v = load i32, ptr addrspace(3) %in, align 4
-  store i32 %v, ptr addrspace(1) %out, align 4
-  ret void
-}
-
 define ptx_kernel void @ptr_as_int(i64 noundef %i, i32 noundef %v) {
 ; IRC-LABEL: define ptx_kernel void @ptr_as_int(
 ; IRC-SAME: i64 noundef [[I:%.*]], i32 noundef [[V:%.*]]) {

gonzalobg

LGTM, thank you Lewis.

AlexMaclean · 2025-05-06T15:57:19Z

One thing I wonder is whether this sort of logic really belongs in assembly printing. It seems like this verification could happen at any point during compilation, such as during an IR pass. Are there examples from other targets of target-specific IR verification? Do you have any thoughts about alternative locations for this verification?

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp

The CUDA API only accepts kernel params in the global and generic address spaces, so display an error message when attempting to emit pointers outside those address-spaces from CUDA (but still allow them for OpenCL).

Fix a clang test which failed with these new error messages, as it illegally passed in kernel args in the .local address space while using the -cuda triple.

Update the lower-args-nvcl.ll test to use the newly preferred untyped load/stores to b64 instead of the u64 ones the origninal version of the test used.

Move the CUDA address-space checks out of ASMPrinter and into the IR Verifier.

Artem-B · 2025-05-19T22:12:56Z

llvm/lib/IR/Verifier.cpp

+          Check(AS != NVPTXAS::AddressSpace::ADDRESS_SPACE_SHARED,
+                ".shared ptr kernel args unsupported in CUDA.", &Arg, &F);


I think the check should be rephrased to only allow generic and global AS, and error out on anything else, so we don't have to update it when a new AS is added, or if/when someone uses a nonsensical AS.

LewisCrawford · 2025-07-07T13:43:49Z

I think it might be best to just close this, and leave it to front-ends to validate instead of adding IR-level validation, as it would make certain IR-level tests a lot more inconvenient to write. The chances of someone hitting problems from using invalid address-spaces like this in a real kernel in a way that bypasses any error-checking the front-end are a lot lower than someone hitting the new error message from this patch while trying to write an IR-level test and having to find an inconvenient workaround for it.

llvmbot added the backend:NVPTX label May 6, 2025

LewisCrawford requested a review from Artem-B May 6, 2025 15:19

LewisCrawford requested a review from AlexMaclean May 6, 2025 15:23

LewisCrawford mentioned this pull request May 6, 2025

Enable .ptr .global .align attributes for kernel attributes for CUDA #114874

Merged

gonzalobg approved these changes May 6, 2025

View reviewed changes

Artem-B reviewed May 6, 2025

View reviewed changes

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp Outdated Show resolved Hide resolved

LewisCrawford added 2 commits May 12, 2025 11:12

[NVPTX] Add errors for incorrect CUDA addrpaces

8bbb3f8

The CUDA API only accepts kernel params in the global and generic address spaces, so display an error message when attempting to emit pointers outside those address-spaces from CUDA (but still allow them for OpenCL).

Update clang CodeGenCUDA/memcpy-libcall.cu test

7853795

Fix a clang test which failed with these new error messages, as it illegally passed in kernel args in the .local address space while using the -cuda triple.

LewisCrawford force-pushed the cuda_addrspace_error_msg branch from c67fc1c to 7853795 Compare May 12, 2025 11:22

llvmbot added the clang Clang issues not falling into any other category label May 12, 2025

LewisCrawford added 2 commits May 12, 2025 15:12

Fix NVPTX/lower-args-nvcl.ll test

02fcf51

Update the lower-args-nvcl.ll test to use the newly preferred untyped load/stores to b64 instead of the u64 ones the origninal version of the test used.

Move checks int IR Verifier

1fadeec

Move the CUDA address-space checks out of ASMPrinter and into the IR Verifier.

llvmbot added the llvm:ir label May 19, 2025

Artem-B reviewed May 19, 2025

View reviewed changes

LewisCrawford closed this Jul 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NVPTX] Add errors for incorrect CUDA addrpaces #138706

[NVPTX] Add errors for incorrect CUDA addrpaces #138706

Uh oh!

LewisCrawford commented May 6, 2025

Uh oh!

llvmbot commented May 6, 2025 •

edited

Loading

Uh oh!

gonzalobg left a comment

Uh oh!

AlexMaclean commented May 6, 2025

Uh oh!

Uh oh!

Artem-B May 19, 2025 •

edited

Loading

Uh oh!

LewisCrawford commented Jul 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		Check(AS != NVPTXAS::AddressSpace::ADDRESS_SPACE_SHARED,
		".shared ptr kernel args unsupported in CUDA.", &Arg, &F);

[NVPTX] Add errors for incorrect CUDA addrpaces #138706

[NVPTX] Add errors for incorrect CUDA addrpaces #138706

Uh oh!

Conversation

LewisCrawford commented May 6, 2025

Uh oh!

llvmbot commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gonzalobg left a comment

Choose a reason for hiding this comment

Uh oh!

AlexMaclean commented May 6, 2025

Uh oh!

Uh oh!

Artem-B May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LewisCrawford commented Jul 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

llvmbot commented May 6, 2025 •

edited

Loading

Artem-B May 19, 2025 •

edited

Loading