API change for new enum in cusparseltsplitkmode-t for cusparseLT 0.7.0+ #150536

tinglvv · 2025-04-02T13:28:50Z

Changing the bool to int to express split_k_mode. Before 0.7.0 we only have 2 cusparseLtSplitKMode_t enum values ONE_KERNEL and TWO_KERNELS so a boolean is enough but since 0.7.0 there are more.

For Blackwell, there has to be minor change to parameter split_k_one_kernel (https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/sparse/cuda/cuSPARSELtOps.cpp#L103), since there are new values introduced to enum cusparseLtSplitKMode_t and a bool type is not enough for it (would have to be replaced with integer) https://docs.nvidia.com/cuda/cusparselt/types.html#cusparseltsplitkmode-t

Error we see without the change

RuntimeError: CUDA error: invalid value when calling `cusparseLtMatmulAlgSetAttribute( &handle, &alg_sel, CUSPARSELT_MATMUL_SPLIT_K_MODE, &splitKMode, sizeof(splitKMode))`

To execute this test, run the following from the base repo dir:
    python test/test_sparse_semi_structured.py TestSparseSemiStructuredCUSPARSELTCUDA.test_csrc_cslt_sparse_mm_search_cuda_int8

cc @ezyang @gchanan @eqy @ptrblck @malfet @atalman @nWEIdia

pytorch-bot · 2025-04-02T13:28:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150536

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Large queue time for macos-m2-15 instances

✅ No Failures

As of commit 5e71e2a with merge base a13c8f2 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-04-02T13:32:27Z

Attention! native_functions.yaml was changed

If you are adding a new function or defaulted argument to native_functions.yaml, you cannot use it from pre-existing Python frontend code until our FC window passes (two weeks). Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. See https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#forwards-compatibility-fc for more info.

Caused by:

aten/src/ATen/native/native_functions.yaml

aten/src/ATen/native/sparse/cuda/cuSPARSELtOps.h

tinglvv · 2025-04-07T11:44:35Z

Splitting this PR into C++ functionality first due to comment "Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. "

j4yan · 2025-04-07T16:22:59Z

add myself

j4yan · 2025-04-10T17:58:34Z

Hi @tinglvv @eqy @nWEIdia
After the splitting, do you expect all the tests will pass?
I doubt the unchanged python wrappers will still work with the new C++ APIs.

tinglvv · 2025-04-11T19:30:17Z

Hi @j4yan, upon discussion with @malfet, could you provide more justification for the change? was wondering if we could add someone from the cusparseLt team for review as well.

Would the change be compatible with older version < 0.7.0? (e.g. cusparseLt 0.6.3).

j4yan · 2025-04-15T21:09:44Z

@tinglvv Yes the change is compatible with older version.
I am not sure how to better justify the change. Without the change, there's no way to denote the newly added cusparseLtSplitKMode_t values like CUSPARSELT_STREAMK which is a perf parameter returned by the tuning routine _cslt_sparse_mm_search .

j4yan · 2025-04-15T21:10:25Z

@tinglvv you can add me as a reviewer.

malfet

This sounds like a BC breaking change, asking Jesse to have a look at it

This is a BC breaking change, let's undersand

malfet

I've found the usage in

pytorch/torch/sparse/_semi_structured_ops.py

Line 191 in f20a266

sparse_result = torch._cslt_sparse_mm(

nWEIdia · 2025-04-18T20:06:24Z

I am noticing that our CI (and binary) are still using cusparselt 0.6.3.2.
https://github.com/pytorch/pytorch/blob/main/.ci/docker/common/install_cuda.sh#L241

Would this PR depend on a PR to bump cusparseLt to v0.7.0+ (e.g. v0.7.1)?

jcaip

This is a BC breaking change. However I believe it's relatively low risk - _cslt_sparse_mm is a private API and split_k_one_kernel is not a commonly used param.

I think the better way to make this change would be to add in a new kwarg split_k_mode and throw a deprecation warning when split_k_one_kernel is used. That way we can mantain BC for any use cases I am not aware of that do use this flag. We can deprecate in a subsequent release (breaking BC).

This also has the added benefit in that it makes the upgrade to cuSPARSELt 0.7.0 safer - As currently written, the version bump would depend on a bc-breaking change. If we add a new kwarg instead, then the BC breaking change happens after the version bump.

aten/src/ATen/native/native_functions.yaml

tinglvv · 2025-04-25T19:38:46Z

@pytorchbot rebase

Aidyn-A · 2025-05-05T08:14:55Z

@malfet @jcaip if I am not mistaken, we can add aten::_cslt_sparse_mm to the ALLOW_LIST to make CI pass:

pytorch/test/forward_backward_compatibility/check_forward_backward_compatibility.py

Lines 50 to 132 in 7e637de

    
           ALLOW_LIST = [ 
        
               ("c10_experimental", datetime.date(9999, 1, 1)), 
        
               # Internal 
        
               ("static", datetime.date(9999, 1, 1)), 
        
               ("prim::ModuleDictIndex", datetime.date(9999, 1, 1)), 
        
               ("prim::MKLDNNRelu6", datetime.date(9999, 1, 1)), 
        
               ("prim::MKLDNNRelu6_", datetime.date(9999, 1, 1)), 
        
               ("prim::is_ort", datetime.date(9999, 1, 1)), 
        
               ("prim::Concat", datetime.date(9999, 1, 1)), 
        
               ("aten::_NestedTensor_GeneralizedBMM", datetime.date(9999, 1, 1)), 
        
               # Internal, profiler-specific ops 
        
               ("profiler::_call_end_callbacks_on_jit_fut*", datetime.date(9999, 1, 1)), 
        
               ("profiler::_record_function_enter", datetime.date(9999, 1, 1)), 
        
               ("aten::_cholesky_helper", datetime.date(9999, 1, 1)), 
        
               ("aten::_lstsq_helper", datetime.date(9999, 1, 1)), 
        
               ("aten::_syevd_helper", datetime.date(9999, 1, 1)), 
        
               ("aten::_linalg_solve_out_helper_", datetime.date(9999, 1, 1)), 
        
               ("aten::select_backward", datetime.date(9999, 1, 1)), 
        
               ("aten::lstsq", datetime.date(9999, 1, 1)), 
        
               ("aten::lstsq.X", datetime.date(9999, 1, 1)), 
        
               ("aten::slice_backward", datetime.date(9999, 1, 1)), 
        
               ("aten::diagonal_backward", datetime.date(9999, 1, 1)), 
        
               ("aten::rowwise_prune", datetime.date(9999, 1, 1)), 
        
               ("aten::eig", datetime.date(9999, 1, 1)), 
        
               ("aten::eig.e", datetime.date(9999, 1, 1)), 
        
               ("aten::adaptive_avg_pool3d_backward", datetime.date(9999, 1, 1)), 
        
               ("aten::_embedding_bag_dense_backward", datetime.date(9999, 1, 1)), 
        
               ("aten::matrix_rank", datetime.date(9999, 1, 1)), 
        
               ("aten::matrix_rank.tol", datetime.date(9999, 1, 1)), 
        
               ("aten::randperm", datetime.date(9999, 1, 1)), 
        
               ("aten::solve", datetime.date(9999, 1, 1)), 
        
               ("aten::solve.solution", datetime.date(9999, 1, 1)), 
        
               ("aten::_solve_helper", datetime.date(9999, 1, 1)), 
        
               ("aten::_convolution_nogroup", datetime.date(9999, 1, 1)), 
        
               ("aten::miopen_convolution_backward", datetime.date(9999, 1, 1)), 
        
               ("aten::miopen_convolution_backward_bias", datetime.date(9999, 1, 1)), 
        
               ("aten::miopen_convolution_backward_input", datetime.date(9999, 1, 1)), 
        
               ("aten::miopen_convolution_backward_weight", datetime.date(9999, 1, 1)), 
        
               ("aten::miopen_convolution_transpose_backward", datetime.date(9999, 1, 1)), 
        
               ("aten::miopen_convolution_transpose_backward_input", datetime.date(9999, 1, 1)), 
        
               ("aten::miopen_convolution_transpose_backward_weight", datetime.date(9999, 1, 1)), 
        
               ("aten::miopen_depthwise_convolution_backward", datetime.date(9999, 1, 1)), 
        
               ("aten::miopen_depthwise_convolution_backward_input", datetime.date(9999, 1, 1)), 
        
               ("aten::miopen_depthwise_convolution_backward_weight", datetime.date(9999, 1, 1)), 
        
               ("aten::_nested_tensor", datetime.date(9999, 1, 1)), 
        
               ("prepacked::unpack_prepacked_sizes_conv2d", datetime.date(9999, 1, 1)), 
        
               ("prepacked::unpack_prepacked_sizes_linear", datetime.date(9999, 1, 1)), 
        
               ("aten::_symeig_helper", datetime.date(9999, 1, 1)), 
        
               ("aten::symeig", datetime.date(9999, 1, 1)), 
        
               ("aten::symeig.e", datetime.date(9999, 1, 1)), 
        
               ("aten::native_multi_head_self_attention", datetime.date(9999, 1, 1)), 
        
               ("aten::_native_multi_head_self_attention", datetime.date(9999, 1, 1)), 
        
               ("aten::grid_sampler_3d_backward", datetime.date(9999, 1, 1)), 
        
               ("aten::_transform_bias_rescale_qkv", datetime.date(9999, 1, 1)), 
        
               ("prim::infer_squeeze_size.dim", datetime.date(9999, 1, 1)), 
        
               ("prim::infer_squeeze_size", datetime.date(9999, 1, 1)), 
        
               ("aten::_weight_norm_cuda_interface", datetime.date(9999, 1, 1)), 
        
               ("aten::_weight_norm_cuda_interface_backward", datetime.date(9999, 1, 1)), 
        
               ("aten::empty.SymInt", datetime.date(9999, 1, 1)), 
        
               # nested tensor temporary auxiliary ops 
        
               ("aten::_reshape_nested", datetime.date(9999, 1, 1)), 
        
               ("aten::_reshape_nested_backward", datetime.date(9999, 1, 1)), 
        
               ("aten::mps_linear", datetime.date(9999, 1, 1)), 
        
               ("aten::_mps_linear", datetime.date(9999, 1, 1)), 
        
               ("aten::_mps_max_pool2d", datetime.date(9999, 1, 1)), 
        
               ("aten::_mps_max_pool2d.out", datetime.date(9999, 1, 1)), 
        
               ("aten::mps_max_pool2d_backward", datetime.date(9999, 1, 1)), 
        
               ("aten::mps_max_pool2d_backward.out", datetime.date(9999, 1, 1)), 
        
               # TODO: FIXME: prims shouldn't be checked 
        
               ("prims::.*", datetime.date(9999, 1, 1)), 
        
               ("aten::_scaled_dot_product_cudnn_attention", datetime.date(9999, 1, 1)), 
        
               # BetterTransformer 1.0 internal operators 
        
               ("aten::_transformer_decoder_only_layer_fwd", datetime.date(9999, 1, 1)), 
        
               ("aten::_native_decoder_only_multi_head_attention", datetime.date(9999, 1, 1)), 
        
               # These ops were moved to python under the c10d_functional namespace 
        
               ("aten::wait_tensor", datetime.date(9999, 1, 30)), 
        
               ("aten::reduce_scatter_tensor", datetime.date(9999, 1, 30)), 
        
               ("aten::all_gather_into_tensor", datetime.date(9999, 1, 30)), 
        
               ("aten::all_reduce", datetime.date(9999, 1, 30)), 
        
               # These ops are defined in torch/csrc/distributed/c10d/Ops.cpp 
        
               # TODO: add back restriction when c10d ops can be exported 
        
               ("c10d::.*", datetime.date(9999, 1, 1)), 
        
           ]

Would that be enough to settle the bc-breaking change?

jcaip · 2025-05-13T00:30:42Z

cc @Aidyn-A @tinglvv

@malfet @jcaip if I am not mistaken, we can add aten::_cslt_sparse_mm to the ALLOW_LIST to make CI pass:

Sorry for the late response was traveling for ICLR and then on PTO the last week. Yes, I think it should be fine to add here. From the comment above I think you can just put: datetime.date(9999, 1, 1).

#   - If we NEVER give BC guarantee for an operator, you can put the
#     date arbitrarily far in the future.

tinglvv · 2025-05-13T20:57:25Z

@pytorchbot rebase -b main

pytorchmergebot · 2025-05-13T20:59:01Z

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

pytorchmergebot · 2025-05-13T20:59:05Z

Successfully rebased cusparselt-blackwell onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout cusparselt-blackwell && git pull --rebase)

tinglvv · 2025-05-14T04:56:50Z

@pytorchbot merge

pytorch-bot · 2025-05-14T04:56:54Z

This PR has pending changes requested. Please address the comments and update the PR before merging.

Resetting the review per Jesse's approval. Merging for now.

tinglvv · 2025-05-14T16:31:00Z

@pytorchbot merge

pytorchmergebot · 2025-05-14T16:33:10Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

atalman

lgtm

pytorchmergebot · 2025-05-14T22:31:35Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

tinglvv · 2025-05-14T23:34:26Z

@pytorchbot merge

pytorchmergebot · 2025-05-14T23:36:30Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

tinglvv requested review from eqy and syed-ahmed as code owners April 2, 2025 13:28

pytorch-bot bot added the release notes: sparse release notes category label Apr 2, 2025

pytorchbot added the open source label Apr 2, 2025

drisspg added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 3, 2025

eqy reviewed Apr 3, 2025

View reviewed changes

aten/src/ATen/native/sparse/cuda/cuSPARSELtOps.h Show resolved Hide resolved

tinglvv changed the title ~~Blackwell support for cusparseLT~~ [backend] Blackwell support for cusparseLT Apr 7, 2025

tinglvv changed the title ~~[backend] Blackwell support for cusparseLT~~ blackwell support for cusparseLT Apr 11, 2025

malfet added the module: bc-breaking Related to a BC-breaking change label Apr 11, 2025

pytorch-bot bot added the topic: bc breaking topic category label Apr 11, 2025

Skylion007 previously approved these changes Apr 18, 2025

View reviewed changes

malfet requested review from jcaip and supriyar April 18, 2025 19:26

malfet previously requested changes Apr 18, 2025

View reviewed changes

malfet reviewed Apr 18, 2025

View reviewed changes

tinglvv changed the title ~~blackwell support for cusparseLT~~ API change for new enum in cusparseltsplitkmode-t for cusparseLT 0.7.1 Apr 18, 2025

tinglvv changed the title ~~API change for new enum in cusparseltsplitkmode-t for cusparseLT 0.7.1~~ API change for new enum in cusparseltsplitkmode-t for cusparseLT 0.7.0+ Apr 18, 2025

jcaip requested changes Apr 18, 2025

View reviewed changes

aten/src/ATen/native/native_functions.yaml Show resolved Hide resolved

jcaip approved these changes Apr 25, 2025

View reviewed changes

tinglvv requested a review from larryliu0820 as a code owner May 13, 2025 18:01

tinglvv added 8 commits May 13, 2025 20:59

Support blackwell for cusparselt

22fe05c

fix return type

da7937b

Remove python code and add C++ compatability

74ea5b5

fix default val for split_k_mode

e56746c

Add back python code

2a86ecd

remove extra code

e54f9f7

fix typo

83b3b92

add to allow_list for bc breaking change

5e71e2a

pytorchmergebot force-pushed the cusparselt-blackwell branch from df3cadd to 5e71e2a Compare May 13, 2025 20:59

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 14, 2025

pytorchmergebot added the merging label May 14, 2025

atalman approved these changes May 14, 2025

View reviewed changes

pytorchmergebot closed this in c2bc7e2 May 14, 2025

pytorchmergebot added Merged and removed merging labels May 14, 2025

API change for new enum in cusparseltsplitkmode-t for cusparseLT 0.7.0+ #150536

API change for new enum in cusparseltsplitkmode-t for cusparseLT 0.7.0+ #150536

Uh oh!

Conversation

tinglvv commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150536

❗ 1 Active SEVs

✅ No Failures

Uh oh!

github-actions bot commented Apr 2, 2025

Attention! native_functions.yaml was changed

Uh oh!

Uh oh!

tinglvv commented Apr 7, 2025

Uh oh!

j4yan commented Apr 7, 2025

Uh oh!

j4yan commented Apr 10, 2025

Uh oh!

tinglvv commented Apr 11, 2025

Uh oh!

j4yan commented Apr 15, 2025

Uh oh!

j4yan commented Apr 15, 2025

Uh oh!

malfet left a comment

Choose a reason for hiding this comment

Uh oh!

malfet left a comment

Choose a reason for hiding this comment

Uh oh!

nWEIdia commented Apr 18, 2025

Uh oh!

jcaip left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tinglvv commented Apr 25, 2025

Uh oh!

Aidyn-A commented May 5, 2025

Uh oh!

jcaip commented May 13, 2025

Uh oh!

tinglvv commented May 13, 2025

Uh oh!

pytorchmergebot commented May 13, 2025

Uh oh!

pytorchmergebot commented May 13, 2025

Uh oh!

tinglvv commented May 14, 2025

Uh oh!

pytorch-bot bot commented May 14, 2025

Uh oh!

tinglvv commented May 14, 2025

Uh oh!

pytorchmergebot commented May 14, 2025

Merge started

Uh oh!

atalman left a comment

Choose a reason for hiding this comment

Uh oh!

pytorchmergebot commented May 14, 2025

Uh oh!

tinglvv commented May 14, 2025

Uh oh!

pytorchmergebot commented May 14, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

tinglvv commented Apr 2, 2025 •

edited

Loading

pytorch-bot bot commented Apr 2, 2025 •

edited

Loading

jcaip left a comment •

edited

Loading