Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 209f4c1

Browse files
[LLVM][NVPTX] Add NVPTX codegen support for clusterlaunchcontrol instruction
This commit adds NVPTX codegen support for clusterlaunchcontrol instructions with tests under clusterlaunchcontrol.ll and clusterlaunchcontrol-multicast.ll. For more information, Please refer [PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-try-cancel)
1 parent 611d81b commit 209f4c1

File tree

7 files changed

+444
-0
lines changed

7 files changed

+444
-0
lines changed

llvm/docs/NVPTXUsage.rst

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1558,6 +1558,102 @@ similar but the latter uses generic addressing (see `Generic Addressing <https:/
15581558

15591559
For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st-bulk>`__.
15601560

1561+
1562+
clusterlaunchcontrol Intrinsics
1563+
-------------------------------
1564+
1565+
'``llvm.nvvm.clusterlaunchcontrol.try_cancel*``' Intrinsics
1566+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1567+
1568+
Syntax:
1569+
"""""""
1570+
1571+
.. code-block:: llvm
1572+
1573+
declare void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async.shared(ptr addrspace(3) %addr, ptr addrspace(3) %mbar)
1574+
declare void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async.multicast.shared(ptr addrspace(3) %addr, ptr addrspace(3) %mbar)
1575+
1576+
Overview:
1577+
"""""""""
1578+
1579+
The ``clusterlaunchcontrol.try_cancel`` intrinsics requests atomically cancelling
1580+
the launch of a cluster that has not started running yet. It asynchronously non-atomically writes
1581+
a 16-byte opaque response to shared memory, pointed to by 16-byte-aligned ``addr`` indicating whether the
1582+
operation succeeded or failed. ``addr`` and 8-byte-aligned ``mbar`` must refer to ``shared::cta``
1583+
otherwise the behavior is undefined. The completion of the asynchronous operation
1584+
is tracked using the mbarrier completion mechanism at ``.cluster`` scope referenced
1585+
by the shared memory pointer, ``mbar``. On success, the opaque response contains
1586+
the CTA id of the first CTA of the canceled cluster; no other successful response
1587+
from other ``clusterlaunchcontrol.try_cancel`` operations from the same grid will
1588+
contain that id.
1589+
1590+
The ``multicast`` variant specifies that the response is asynchronously non-atomically written to
1591+
the corresponding shared memory location of each CTA in the requesting cluster.
1592+
The completion of the write of each local response is tracked by independent
1593+
mbarriers at the corresponding shared memory location of each CTA in the
1594+
cluster.
1595+
1596+
For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-try-cancel>`__.
1597+
1598+
'``llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled``' Intrinsic
1599+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1600+
1601+
Syntax:
1602+
"""""""
1603+
1604+
.. code-block:: llvm
1605+
1606+
declare i1 @llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled(i128 %try_cancel_response)
1607+
1608+
Overview:
1609+
"""""""""
1610+
1611+
The ``llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled`` intrinsic decodes the opaque response written by the
1612+
``llvm.nvvm.clusterlaunchcontrol.try_cancel`` operation.
1613+
1614+
The intrinsic returns ``0`` (false) if the request failed. If the request succeeded,
1615+
it returns ``1`` (true). A true result indicates that:
1616+
1617+
- the thread block cluster whose first CTA id matches that of the response
1618+
handle will not run, and
1619+
- no other successful response of another ``try_cancel`` request in the grid will contain
1620+
the first CTA id of that cluster
1621+
1622+
For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-query-cancel>`__.
1623+
1624+
1625+
'``llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.*``' Intrinsics
1626+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1627+
1628+
Syntax:
1629+
"""""""
1630+
1631+
.. code-block:: llvm
1632+
1633+
declare {i32, i32, i32} @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid(i128 %try_cancel_response)
1634+
declare i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.x(i128 %try_cancel_response)
1635+
declare i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.y(i128 %try_cancel_response)
1636+
declare i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.z(i128 %try_cancel_response)
1637+
1638+
Overview:
1639+
"""""""""
1640+
1641+
The ``clusterlaunchcontrol.query_cancel.get_first_ctaid`` intrinsic can be
1642+
used to decode the successful opaque response written by the
1643+
``llvm.nvvm.clusterlaunchcontrol.try_cancel`` operation.
1644+
1645+
If the request succeeded:
1646+
1647+
- ``llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.{x,y,z}`` returns
1648+
the coordinate of the first CTA in the canceled cluster, either x, y, or z.
1649+
1650+
- ``llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid`` returns a struct
1651+
of three elements which correspond to the x, y, z coordinates of the first CTA.
1652+
1653+
If the request failed, the behavior of these intrinsics is undefined.
1654+
1655+
For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-query-cancel>`__.
1656+
15611657
Other Intrinsics
15621658
----------------
15631659

llvm/include/llvm/IR/IntrinsicsNVVM.td

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5472,4 +5472,38 @@ def int_nvvm_st_bulk_shared_cta : DefaultAttrsIntrinsic<[],
54725472
[IntrArgMemOnly, IntrWriteMem,
54735473
WriteOnly<ArgIndex<0>>, NoCapture<ArgIndex<0>>, ImmArg<ArgIndex<2>>]>;
54745474

5475+
//
5476+
// clusterlaunchcontorl Intrinsics
5477+
//
5478+
5479+
// clusterlaunchcontrol.try_cancel
5480+
5481+
def int_nvvm_clusterlaunchcontrol_try_cancel_async_shared
5482+
: DefaultAttrsIntrinsic<[], [llvm_shared_ptr_ty, llvm_shared_ptr_ty],
5483+
[IntrHasSideEffects, IntrArgMemOnly],
5484+
"llvm.nvvm.clusterlaunchcontrol.try_cancel.async.shared">;
5485+
5486+
def int_nvvm_clusterlaunchcontrol_try_cancel_async_multicast_shared
5487+
: DefaultAttrsIntrinsic<[], [llvm_shared_ptr_ty, llvm_shared_ptr_ty],
5488+
[IntrHasSideEffects, IntrArgMemOnly],
5489+
"llvm.nvvm.clusterlaunchcontrol.try_cancel.async.multicast.shared">;
5490+
5491+
// clusterlaunchcontrol.query_cancel.is_canceled
5492+
5493+
def int_nvvm_clusterlaunchcontrol_query_cancel_is_canceled
5494+
: DefaultAttrsIntrinsic<[llvm_i1_ty], [llvm_i128_ty], [IntrNoMem, IntrSpeculatable],
5495+
"llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled">;
5496+
5497+
// clusterlaunchcontrol.query_cancel.get_first_ctaid*
5498+
5499+
def int_nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid
5500+
: DefaultAttrsIntrinsic<[llvm_i32_ty, llvm_i32_ty, llvm_i32_ty], [llvm_i128_ty], [IntrNoMem, IntrSpeculatable],
5501+
"llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid">;
5502+
5503+
foreach dim = ["x", "y", "z"] in {
5504+
def int_nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_ # dim
5505+
: DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_i128_ty], [IntrNoMem, IntrSpeculatable],
5506+
"llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid." # dim>;
5507+
}
5508+
54755509
} // let TargetPrefix = "nvvm"

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1015,6 +1015,8 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
10151015
Custom);
10161016

10171017
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);
1018+
// Enable custom lowering for the i128 bit operand with clusterlaunchcontrol
1019+
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i128, Custom);
10181020
}
10191021

10201022
const char *NVPTXTargetLowering::getTargetNodeName(unsigned Opcode) const {
@@ -1091,6 +1093,11 @@ const char *NVPTXTargetLowering::getTargetNodeName(unsigned Opcode) const {
10911093
MAKE_CASE(NVPTXISD::BrxEnd)
10921094
MAKE_CASE(NVPTXISD::BrxItem)
10931095
MAKE_CASE(NVPTXISD::BrxStart)
1096+
MAKE_CASE(NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED)
1097+
MAKE_CASE(NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID)
1098+
MAKE_CASE(NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_X)
1099+
MAKE_CASE(NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Y)
1100+
MAKE_CASE(NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Z)
10941101
}
10951102
return nullptr;
10961103

@@ -1163,6 +1170,47 @@ NVPTXTargetLowering::LowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const {
11631170
return DAG.getNode(NVPTXISD::Wrapper, dl, PtrVT, Op);
11641171
}
11651172

1173+
static SDValue LowerClusterLaunchControl(SDValue Op, SelectionDAG &DAG) {
1174+
1175+
SDNode *N = Op.getNode();
1176+
if (N->getOperand(1).getValueType() != MVT::i128) {
1177+
// return, if the operand is already lowered
1178+
return SDValue();
1179+
}
1180+
1181+
unsigned IID =
1182+
cast<ConstantSDNode>(N->getOperand(0).getNode())->getZExtValue();
1183+
auto Opcode = [&]() {
1184+
switch (IID) {
1185+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_is_canceled:
1186+
return NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED;
1187+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid:
1188+
return NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID;
1189+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_x:
1190+
return NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_X;
1191+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_y:
1192+
return NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Y;
1193+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_z:
1194+
return NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Z;
1195+
default:
1196+
llvm_unreachable("unsupported/unhandled intrinsic");
1197+
}
1198+
}();
1199+
1200+
SDLoc DL(N);
1201+
SDValue TryCancelResponse = N->getOperand(1);
1202+
SDValue Cast = DAG.getNode(ISD::BITCAST, DL, MVT::v2i64, TryCancelResponse);
1203+
SDValue TryCancelResponse0 =
1204+
DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i64, Cast,
1205+
DAG.getIntPtrConstant(0, DL));
1206+
SDValue TryCancelResponse1 =
1207+
DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i64, Cast,
1208+
DAG.getIntPtrConstant(1, DL));
1209+
1210+
return DAG.getNode(Opcode, DL, N->getVTList(),
1211+
{TryCancelResponse0, TryCancelResponse1});
1212+
}
1213+
11661214
std::string NVPTXTargetLowering::getPrototype(
11671215
const DataLayout &DL, Type *retTy, const ArgListTy &Args,
11681216
const SmallVectorImpl<ISD::OutputArg> &Outs, MaybeAlign retAlignment,
@@ -2763,6 +2811,12 @@ static SDValue lowerIntrinsicWOChain(SDValue Op, SelectionDAG &DAG) {
27632811
return Op;
27642812
case Intrinsic::nvvm_internal_addrspace_wrap:
27652813
return Op.getOperand(1);
2814+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_is_canceled:
2815+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid:
2816+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_x:
2817+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_y:
2818+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_z:
2819+
return LowerClusterLaunchControl(Op, DAG);
27662820
}
27672821
}
27682822

llvm/lib/Target/NVPTX/NVPTXISelLowering.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,11 @@ enum NodeType : unsigned {
7979
BrxStart,
8080
BrxItem,
8181
BrxEnd,
82+
CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED,
83+
CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID,
84+
CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_X,
85+
CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Y,
86+
CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Z,
8287
Dummy,
8388

8489
FIRST_MEMORY_OPCODE,

llvm/lib/Target/NVPTX/NVPTXIntrinsics.td

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7354,3 +7354,84 @@ def INT_NVVM_ST_BULK_SHARED_CTA:
73547354
"st.bulk.shared::cta [$dest_addr], $size, 0;",
73557355
[(int_nvvm_st_bulk_shared_cta addr:$dest_addr, i64:$size, (i64 0))]>,
73567356
Requires<[hasSM<100>, hasPTX<86>]>;
7357+
7358+
//
7359+
// clusterlaunchcontorl Instructions
7360+
//
7361+
7362+
def CLUSTERLAUNCHCONTRL_TRY_CANCEL:
7363+
NVPTXInst<(outs), (ins ADDR:$addr, ADDR:$mbar),
7364+
"clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes.b128 " #
7365+
"[$addr], [$mbar];",
7366+
[(int_nvvm_clusterlaunchcontrol_try_cancel_async_shared addr:$addr, addr:$mbar)]>,
7367+
Requires<[hasSM<100>, hasPTX<86>]>;
7368+
7369+
def CLUSTERLAUNCHCONTRL_TRY_CANCEL_MULTICAST:
7370+
NVPTXInst<(outs), (ins ADDR:$addr, ADDR:$mbar),
7371+
"clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes" #
7372+
".multicast::cluster::all.b128 " #
7373+
"[$addr], [$mbar];",
7374+
[(int_nvvm_clusterlaunchcontrol_try_cancel_async_multicast_shared addr:$addr, addr:$mbar)]>,
7375+
Requires<[hasSM<100>, hasArchAccelFeatures, hasPTX<86>]>;
7376+
7377+
def SDTClusterLaunchControlQueryCancelIsCanceled: SDTypeProfile<1, 2, []>;
7378+
def clusterlaunchcontrol_query_cancel_is_canceled:
7379+
SDNode<"NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED",
7380+
SDTClusterLaunchControlQueryCancelIsCanceled, []>;
7381+
7382+
def CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED:
7383+
NVPTXInst<(outs Int1Regs:$pred), (ins Int64Regs:$try_cancel_response0, Int64Regs:$try_cancel_response1),
7384+
"{{\n\t" #
7385+
".reg .b128 %handle;\n\t" #
7386+
"mov.b128 %handle, {$try_cancel_response0, $try_cancel_response1};\n\t" #
7387+
"clusterlaunchcontrol.query_cancel.is_canceled.pred.b128 $pred, %handle;\n\t" #
7388+
"}}", [(set i1:$pred,
7389+
(clusterlaunchcontrol_query_cancel_is_canceled i64:$try_cancel_response0, i64:$try_cancel_response1))]>,
7390+
Requires<[hasSM<100>, hasPTX<86>]>;
7391+
7392+
def SDTClusterLaunchControlQueryCancelGetFirstCtaId: SDTypeProfile<3, 2, []>;
7393+
def clusterlaunchcontrol_query_cancel_first_cta_id:
7394+
SDNode<"NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID",
7395+
SDTClusterLaunchControlQueryCancelGetFirstCtaId, []>;
7396+
7397+
def CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID:
7398+
NVPTXInst<(outs Int32Regs:$r1, Int32Regs:$r2, Int32Regs:$r3),
7399+
(ins Int64Regs:$try_cancel_response0, Int64Regs:$try_cancel_response1),
7400+
"{{\n\t" #
7401+
".reg .b128 %handle;\n\t" #
7402+
"mov.b128 %handle, {$try_cancel_response0, $try_cancel_response1};\n\t" #
7403+
"clusterlaunchcontrol.query_cancel.get_first_ctaid.v4.b32.b128 {$r1, $r2, $r3, _}, %handle;\n\t" #
7404+
"}}", [(set i32:$r1, i32:$r2, i32:$r3,
7405+
(clusterlaunchcontrol_query_cancel_first_cta_id i64:$try_cancel_response0, i64:$try_cancel_response1))]>,
7406+
Requires<[hasSM<100>, hasPTX<86>]>;
7407+
7408+
def SDTClusterLaunchControlQueryCancelGetFirstCtaIdX: SDTypeProfile<1, 2, []>;
7409+
def clusterlaunchcontrol_query_cancel_first_cta_id_x :
7410+
SDNode<"NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_X",
7411+
SDTClusterLaunchControlQueryCancelGetFirstCtaIdX, []>;
7412+
7413+
def SDTClusterLaunchControlQueryCancelGetFirstCtaIdY: SDTypeProfile<1, 2, []>;
7414+
def clusterlaunchcontrol_query_cancel_first_cta_id_y:
7415+
SDNode<"NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Y",
7416+
SDTClusterLaunchControlQueryCancelGetFirstCtaIdY, []>;
7417+
7418+
def SDTClusterLaunchControlQueryCancelGetFirstCtaIdZ: SDTypeProfile<1, 2, []>;
7419+
def clusterlaunchcontrol_query_cancel_first_cta_id_z:
7420+
SDNode<"NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Z",
7421+
SDTClusterLaunchControlQueryCancelGetFirstCtaIdZ, []>;
7422+
7423+
class CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID<string Dim>:
7424+
NVPTXInst<(outs Int32Regs:$reg), (ins Int64Regs:$try_cancel_response0, Int64Regs:$try_cancel_response1),
7425+
"{{\n\t" #
7426+
".reg .b128 %handle;\n\t" #
7427+
"mov.b128 %handle, {$try_cancel_response0, $try_cancel_response1};\n\t" #
7428+
"clusterlaunchcontrol.query_cancel.get_first_ctaid::" # Dim # ".b32.b128 $reg, %handle;\n\t" #
7429+
"}}", [(set i32:$reg,
7430+
(!cast<SDNode>("clusterlaunchcontrol_query_cancel_first_cta_id_" # Dim)
7431+
i64:$try_cancel_response0, i64:$try_cancel_response1))]>,
7432+
Requires<[hasSM<100>, hasPTX<86>]>;
7433+
7434+
foreach dim = ["x", "y", "z"] in {
7435+
def CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_ # dim:
7436+
CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID<dim>;
7437+
}
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
2+
; RUN: llc -o - -mcpu=sm_100a -march=nvptx64 -mattr=+ptx86 %s | FileCheck %s --check-prefixes=CHECK,CHECK-PTX-SHARED64
3+
; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_100a -mattr=+ptx86 --nvptx-short-ptr | FileCheck --check-prefixes=CHECK,CHECK-PTX-SHARED32 %s
4+
; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_100a -mattr=+ptx86 | %ptxas-verify -arch=sm_100a %}
5+
; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_100a -mattr=+ptx86 --nvptx-short-ptr | %ptxas-verify -arch=sm_100a %}
6+
; RUN: llc -o - -mcpu=sm_101a -march=nvptx64 -mattr=+ptx86 %s | FileCheck %s --check-prefixes=CHECK,CHECK-PTX-SHARED64
7+
; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_101a -mattr=+ptx86 --nvptx-short-ptr | FileCheck --check-prefixes=CHECK,CHECK-PTX-SHARED32 %s
8+
; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_101a -mattr=+ptx86 | %ptxas-verify -arch=sm_101a %}
9+
; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_101a -mattr=+ptx86 --nvptx-short-ptr | %ptxas-verify -arch=sm_101a %}
10+
; RUN: llc -o - -mcpu=sm_120a -march=nvptx64 -mattr=+ptx86 %s | FileCheck %s --check-prefixes=CHECK,CHECK-PTX-SHARED64
11+
; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_120a -mattr=+ptx86 --nvptx-short-ptr | FileCheck --check-prefixes=CHECK,CHECK-PTX-SHARED32 %s
12+
; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_120a -mattr=+ptx86 | %ptxas-verify -arch=sm_120a %}
13+
; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_120a -mattr=+ptx86 --nvptx-short-ptr | %ptxas-verify -arch=sm_120a %}
14+
15+
define void @nvvm_clusterlaunchcontrol_try_cancel_multicast(ptr %addr, ptr %mbar,
16+
; CHECK-PTX-SHARED64-LABEL: nvvm_clusterlaunchcontrol_try_cancel_multicast(
17+
; CHECK-PTX-SHARED64: {
18+
; CHECK-PTX-SHARED64-NEXT: .reg .b64 %rd<3>;
19+
; CHECK-PTX-SHARED64-EMPTY:
20+
; CHECK-PTX-SHARED64-NEXT: // %bb.0:
21+
; CHECK-PTX-SHARED64-NEXT: ld.param.u64 %rd1, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_2];
22+
; CHECK-PTX-SHARED64-NEXT: ld.param.u64 %rd2, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_3];
23+
; CHECK-PTX-SHARED64-NEXT: clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes.multicast::cluster::all.b128 [%rd1], [%rd2];
24+
; CHECK-PTX-SHARED64-NEXT: ret;
25+
;
26+
; CHECK-PTX-SHARED32-LABEL: nvvm_clusterlaunchcontrol_try_cancel_multicast(
27+
; CHECK-PTX-SHARED32: {
28+
; CHECK-PTX-SHARED32-NEXT: .reg .b32 %r<3>;
29+
; CHECK-PTX-SHARED32-EMPTY:
30+
; CHECK-PTX-SHARED32-NEXT: // %bb.0:
31+
; CHECK-PTX-SHARED32-NEXT: ld.param.u32 %r1, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_2];
32+
; CHECK-PTX-SHARED32-NEXT: ld.param.u32 %r2, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_3];
33+
; CHECK-PTX-SHARED32-NEXT: clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes.multicast::cluster::all.b128 [%r1], [%r2];
34+
; CHECK-PTX-SHARED32-NEXT: ret;
35+
ptr addrspace(3) %saddr, ptr addrspace(3) %smbar,
36+
i128 %try_cancel_response) {
37+
38+
tail call void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async.multicast.shared(ptr addrspace(3) %saddr, ptr addrspace(3) %smbar)
39+
ret void;
40+
}
41+
;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
42+
; CHECK: {{.*}}

0 commit comments

Comments
 (0)