Thanks to visit codestin.com
Credit goes to github.com

Skip to content

API change for new enum in cusparseltsplitkmode-t for cusparseLT 0.7.0+ #150536

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

tinglvv
Copy link
Collaborator

@tinglvv tinglvv commented Apr 2, 2025

Changing the bool to int to express split_k_mode. Before 0.7.0 we only have 2 cusparseLtSplitKMode_t enum values ONE_KERNEL and TWO_KERNELS so a boolean is enough but since 0.7.0 there are more.

For Blackwell, there has to be minor change to parameter split_k_one_kernel (https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/sparse/cuda/cuSPARSELtOps.cpp#L103), since there are new values introduced to enum cusparseLtSplitKMode_t and a bool type is not enough for it (would have to be replaced with integer) https://docs.nvidia.com/cuda/cusparselt/types.html#cusparseltsplitkmode-t

Error we see without the change

RuntimeError: CUDA error: invalid value when calling `cusparseLtMatmulAlgSetAttribute( &handle, &alg_sel, CUSPARSELT_MATMUL_SPLIT_K_MODE, &splitKMode, sizeof(splitKMode))`

To execute this test, run the following from the base repo dir:
    python test/test_sparse_semi_structured.py TestSparseSemiStructuredCUSPARSELTCUDA.test_csrc_cslt_sparse_mm_search_cuda_int8

cc @ezyang @gchanan @eqy @ptrblck @malfet @atalman @nWEIdia

@tinglvv tinglvv requested review from eqy and syed-ahmed as code owners April 2, 2025 13:28
@pytorch-bot pytorch-bot bot added the release notes: sparse release notes category label Apr 2, 2025
Copy link

pytorch-bot bot commented Apr 2, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150536

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 5e71e2a with merge base a13c8f2 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Contributor

github-actions bot commented Apr 2, 2025

Attention! native_functions.yaml was changed

If you are adding a new function or defaulted argument to native_functions.yaml, you cannot use it from pre-existing Python frontend code until our FC window passes (two weeks). Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. See https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#forwards-compatibility-fc for more info.


Caused by:

@drisspg drisspg added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 3, 2025
@tinglvv
Copy link
Collaborator Author

tinglvv commented Apr 7, 2025

Splitting this PR into C++ functionality first due to comment "Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. "

@tinglvv tinglvv changed the title Blackwell support for cusparseLT [backend] Blackwell support for cusparseLT Apr 7, 2025
@j4yan
Copy link

j4yan commented Apr 7, 2025

add myself

@j4yan
Copy link

j4yan commented Apr 10, 2025

Hi @tinglvv @eqy @nWEIdia
After the splitting, do you expect all the tests will pass?
I doubt the unchanged python wrappers will still work with the new C++ APIs.

@tinglvv tinglvv changed the title [backend] Blackwell support for cusparseLT blackwell support for cusparseLT Apr 11, 2025
@malfet malfet added the module: bc-breaking Related to a BC-breaking change label Apr 11, 2025
@pytorch-bot pytorch-bot bot added the topic: bc breaking topic category label Apr 11, 2025
@tinglvv
Copy link
Collaborator Author

tinglvv commented Apr 11, 2025

Hi @j4yan, upon discussion with @malfet, could you provide more justification for the change? was wondering if we could add someone from the cusparseLt team for review as well.

Would the change be compatible with older version < 0.7.0? (e.g. cusparseLt 0.6.3).

@j4yan
Copy link

j4yan commented Apr 15, 2025

@tinglvv Yes the change is compatible with older version.
I am not sure how to better justify the change. Without the change, there's no way to denote the newly added cusparseLtSplitKMode_t values like CUSPARSELT_STREAMK which is a perf parameter returned by the tuning routine _cslt_sparse_mm_search .

@j4yan
Copy link

j4yan commented Apr 15, 2025

@tinglvv you can add me as a reviewer.

Skylion007
Skylion007 previously approved these changes Apr 18, 2025
@malfet malfet requested review from supriyar and jcaip April 18, 2025 19:26
malfet
malfet previously requested changes Apr 18, 2025
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a BC breaking change, asking Jesse to have a look at it

@malfet malfet dismissed Skylion007’s stale review April 18, 2025 19:31

This is a BC breaking change, let's undersand

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found the usage in

sparse_result = torch._cslt_sparse_mm(

@tinglvv tinglvv changed the title blackwell support for cusparseLT API change for new enum in cusparseltsplitkmode-t for cusparseLT 0.7.1 Apr 18, 2025
@tinglvv tinglvv changed the title API change for new enum in cusparseltsplitkmode-t for cusparseLT 0.7.1 API change for new enum in cusparseltsplitkmode-t for cusparseLT 0.7.0+ Apr 18, 2025
@nWEIdia
Copy link
Collaborator

nWEIdia commented Apr 18, 2025

I am noticing that our CI (and binary) are still using cusparselt 0.6.3.2.
https://github.com/pytorch/pytorch/blob/main/.ci/docker/common/install_cuda.sh#L241

Would this PR depend on a PR to bump cusparseLt to v0.7.0+ (e.g. v0.7.1)?

Copy link
Contributor

@jcaip jcaip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a BC breaking change. However I believe it's relatively low risk - _cslt_sparse_mm is a private API and split_k_one_kernel is not a commonly used param.

I think the better way to make this change would be to add in a new kwarg split_k_mode and throw a deprecation warning when split_k_one_kernel is used. That way we can mantain BC for any use cases I am not aware of that do use this flag. We can deprecate in a subsequent release (breaking BC).

This also has the added benefit in that it makes the upgrade to cuSPARSELt 0.7.0 safer - As currently written, the version bump would depend on a bc-breaking change. If we add a new kwarg instead, then the BC breaking change happens after the version bump.

@tinglvv
Copy link
Collaborator Author

tinglvv commented Apr 25, 2025

@pytorchbot rebase

@Aidyn-A
Copy link
Collaborator

Aidyn-A commented May 5, 2025

@malfet @jcaip if I am not mistaken, we can add aten::_cslt_sparse_mm to the ALLOW_LIST to make CI pass:

ALLOW_LIST = [
("c10_experimental", datetime.date(9999, 1, 1)),
# Internal
("static", datetime.date(9999, 1, 1)),
("prim::ModuleDictIndex", datetime.date(9999, 1, 1)),
("prim::MKLDNNRelu6", datetime.date(9999, 1, 1)),
("prim::MKLDNNRelu6_", datetime.date(9999, 1, 1)),
("prim::is_ort", datetime.date(9999, 1, 1)),
("prim::Concat", datetime.date(9999, 1, 1)),
("aten::_NestedTensor_GeneralizedBMM", datetime.date(9999, 1, 1)),
# Internal, profiler-specific ops
("profiler::_call_end_callbacks_on_jit_fut*", datetime.date(9999, 1, 1)),
("profiler::_record_function_enter", datetime.date(9999, 1, 1)),
("aten::_cholesky_helper", datetime.date(9999, 1, 1)),
("aten::_lstsq_helper", datetime.date(9999, 1, 1)),
("aten::_syevd_helper", datetime.date(9999, 1, 1)),
("aten::_linalg_solve_out_helper_", datetime.date(9999, 1, 1)),
("aten::select_backward", datetime.date(9999, 1, 1)),
("aten::lstsq", datetime.date(9999, 1, 1)),
("aten::lstsq.X", datetime.date(9999, 1, 1)),
("aten::slice_backward", datetime.date(9999, 1, 1)),
("aten::diagonal_backward", datetime.date(9999, 1, 1)),
("aten::rowwise_prune", datetime.date(9999, 1, 1)),
("aten::eig", datetime.date(9999, 1, 1)),
("aten::eig.e", datetime.date(9999, 1, 1)),
("aten::adaptive_avg_pool3d_backward", datetime.date(9999, 1, 1)),
("aten::_embedding_bag_dense_backward", datetime.date(9999, 1, 1)),
("aten::matrix_rank", datetime.date(9999, 1, 1)),
("aten::matrix_rank.tol", datetime.date(9999, 1, 1)),
("aten::randperm", datetime.date(9999, 1, 1)),
("aten::solve", datetime.date(9999, 1, 1)),
("aten::solve.solution", datetime.date(9999, 1, 1)),
("aten::_solve_helper", datetime.date(9999, 1, 1)),
("aten::_convolution_nogroup", datetime.date(9999, 1, 1)),
("aten::miopen_convolution_backward", datetime.date(9999, 1, 1)),
("aten::miopen_convolution_backward_bias", datetime.date(9999, 1, 1)),
("aten::miopen_convolution_backward_input", datetime.date(9999, 1, 1)),
("aten::miopen_convolution_backward_weight", datetime.date(9999, 1, 1)),
("aten::miopen_convolution_transpose_backward", datetime.date(9999, 1, 1)),
("aten::miopen_convolution_transpose_backward_input", datetime.date(9999, 1, 1)),
("aten::miopen_convolution_transpose_backward_weight", datetime.date(9999, 1, 1)),
("aten::miopen_depthwise_convolution_backward", datetime.date(9999, 1, 1)),
("aten::miopen_depthwise_convolution_backward_input", datetime.date(9999, 1, 1)),
("aten::miopen_depthwise_convolution_backward_weight", datetime.date(9999, 1, 1)),
("aten::_nested_tensor", datetime.date(9999, 1, 1)),
("prepacked::unpack_prepacked_sizes_conv2d", datetime.date(9999, 1, 1)),
("prepacked::unpack_prepacked_sizes_linear", datetime.date(9999, 1, 1)),
("aten::_symeig_helper", datetime.date(9999, 1, 1)),
("aten::symeig", datetime.date(9999, 1, 1)),
("aten::symeig.e", datetime.date(9999, 1, 1)),
("aten::native_multi_head_self_attention", datetime.date(9999, 1, 1)),
("aten::_native_multi_head_self_attention", datetime.date(9999, 1, 1)),
("aten::grid_sampler_3d_backward", datetime.date(9999, 1, 1)),
("aten::_transform_bias_rescale_qkv", datetime.date(9999, 1, 1)),
("prim::infer_squeeze_size.dim", datetime.date(9999, 1, 1)),
("prim::infer_squeeze_size", datetime.date(9999, 1, 1)),
("aten::_weight_norm_cuda_interface", datetime.date(9999, 1, 1)),
("aten::_weight_norm_cuda_interface_backward", datetime.date(9999, 1, 1)),
("aten::empty.SymInt", datetime.date(9999, 1, 1)),
# nested tensor temporary auxiliary ops
("aten::_reshape_nested", datetime.date(9999, 1, 1)),
("aten::_reshape_nested_backward", datetime.date(9999, 1, 1)),
("aten::mps_linear", datetime.date(9999, 1, 1)),
("aten::_mps_linear", datetime.date(9999, 1, 1)),
("aten::_mps_max_pool2d", datetime.date(9999, 1, 1)),
("aten::_mps_max_pool2d.out", datetime.date(9999, 1, 1)),
("aten::mps_max_pool2d_backward", datetime.date(9999, 1, 1)),
("aten::mps_max_pool2d_backward.out", datetime.date(9999, 1, 1)),
# TODO: FIXME: prims shouldn't be checked
("prims::.*", datetime.date(9999, 1, 1)),
("aten::_scaled_dot_product_cudnn_attention", datetime.date(9999, 1, 1)),
# BetterTransformer 1.0 internal operators
("aten::_transformer_decoder_only_layer_fwd", datetime.date(9999, 1, 1)),
("aten::_native_decoder_only_multi_head_attention", datetime.date(9999, 1, 1)),
# These ops were moved to python under the c10d_functional namespace
("aten::wait_tensor", datetime.date(9999, 1, 30)),
("aten::reduce_scatter_tensor", datetime.date(9999, 1, 30)),
("aten::all_gather_into_tensor", datetime.date(9999, 1, 30)),
("aten::all_reduce", datetime.date(9999, 1, 30)),
# These ops are defined in torch/csrc/distributed/c10d/Ops.cpp
# TODO: add back restriction when c10d ops can be exported
("c10d::.*", datetime.date(9999, 1, 1)),
]

Would that be enough to settle the bc-breaking change?

@jcaip
Copy link
Contributor

jcaip commented May 13, 2025

cc @Aidyn-A @tinglvv

@malfet @jcaip if I am not mistaken, we can add aten::_cslt_sparse_mm to the ALLOW_LIST to make CI pass:

Sorry for the late response was traveling for ICLR and then on PTO the last week. Yes, I think it should be fine to add here. From the comment above I think you can just put: datetime.date(9999, 1, 1).

#   - If we NEVER give BC guarantee for an operator, you can put the
#     date arbitrarily far in the future.

@tinglvv tinglvv requested a review from larryliu0820 as a code owner May 13, 2025 18:01
@tinglvv
Copy link
Collaborator Author

tinglvv commented May 13, 2025

@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased cusparselt-blackwell onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout cusparselt-blackwell && git pull --rebase)

@tinglvv
Copy link
Collaborator Author

tinglvv commented May 14, 2025

@pytorchbot merge

Copy link

pytorch-bot bot commented May 14, 2025

This PR has pending changes requested. Please address the comments and update the PR before merging.

@tinglvv tinglvv dismissed malfet’s stale review May 14, 2025 16:30

Resetting the review per Jesse's approval. Merging for now.

@tinglvv
Copy link
Collaborator Author

tinglvv commented May 14, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 14, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@tinglvv
Copy link
Collaborator Author

tinglvv commented May 14, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged module: bc-breaking Related to a BC-breaking change open source release notes: sparse release notes category topic: bc breaking topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.