[TEST][ATen][CUDA] Skip row-wise scaled matrix mmultiplication tests on sm_120+ #152814

Aidyn-A · 2025-05-05T09:28:18Z

The float8 row-wise scaled matmuls are not supported on Blackwell yet. This PR adds skips to those tests to decrease the noise on sm_120+ machines.

cc @ptrblck @msaroufim @eqy @jerryzh168

pytorch-bot · 2025-05-05T09:28:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152814

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6aba225 with merge base 5796212 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

eqy

Is this still needed after #148421 ?
I'm seeing that on a fresh source build
test_float8_rowwise_scaling_sanity_use_fast_accum_True_cuda
test_float8_rowwise_scaling_sanity_use_fast_accum_False_cuda
test_scaled_mm_vs_emulated_row_wise
all pass

Aidyn-A · 2025-05-06T08:15:57Z

Is this still needed after #148421 ? I'm seeing that on a fresh source build test_float8_rowwise_scaling_sanity_use_fast_accum_True_cuda test_float8_rowwise_scaling_sanity_use_fast_accum_False_cuda test_scaled_mm_vs_emulated_row_wise all pass

My bad, it is needed for sm_120, not sm_100.

Aidyn-A · 2025-05-07T08:59:04Z

@pytorchbot rebase

pytorchmergebot · 2025-05-07T09:00:38Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-05-07T09:00:43Z

Successfully rebased test_matmul_cuda_skip_rowwise_scaled_mm_on_blackwell onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout test_matmul_cuda_skip_rowwise_scaled_mm_on_blackwell && git pull --rebase)

Skylion007 · 2025-05-07T13:06:35Z

Hmm, is this because CUTLASS is missing those specializations? Or because we are missing something minor on our side that would unblock support? Like a missing template specialization, overly restrictive dispatch logic, or missing cmake are to build those kernels from SM120?

Skylion007 · 2025-05-07T13:33:58Z

Doesn't it support it here? Are we missing dispatch logic?

pytorch/aten/src/ATen/native/cuda/RowwiseScaledMM.cu

Line 305 in f393ee5

template <

Skylion007 · 2025-05-07T13:38:25Z

Does just adding SM120 here fix it or is SM100 not compatible with SM120?

pytorch/aten/src/ATen/native/cuda/RowwiseScaledMM.cu

Line 718 in f393ee5

const bool sm10x = properties != nullptr && properties->major == 10;

Aidyn-A · 2025-05-07T14:00:09Z

cc @eqy @malfet to review

Aidyn-A · 2025-05-07T14:01:16Z

Hmm, is this because CUTLASS is missing those specializations? Or because we are missing something minor on our side that would unblock support? Like a missing template specialization, overly restrictive dispatch logic, or missing cmake are to build those kernels from SM120?

No, it is not as trivial as adding sm_120 to those places. I have tried that, CUTLASS just fails as "uninitialized" (whatever that means).

test/test_matmul_cuda.py

Skylion007 · 2025-05-07T13:34:13Z

test/test_matmul_cuda.py


    @unittest.skipIf(not PLATFORM_SUPPORTS_FP8 or IS_WINDOWS, f8_msg)
-    @unittest.skipIf(not SM89OrLater, "rowwise implementation is currently sm89+ specific")
+    @unittest.skipIf(not SM89OrLater or _IS_SM12X, "rowwise implementation is currently sm89-sm90 specific")


Should we skip or XFAIL?

Skylion007 · 2025-05-08T15:25:38Z

Hmm, is this because CUTLASS is missing those specializations? Or because we are missing something minor on our side that would unblock support? Like a missing template specialization, overly restrictive dispatch logic, or missing cmake are to build those kernels from SM120?

No, it is not as trivial as adding sm_120 to those places. I have tried that, CUTLASS just fails as "uninitialized" (whatever that means).

If you could post the trace, that could be helpful to figuring out how to enable it.

Aidyn-A · 2025-05-08T16:58:23Z

@pytorchbot merge

pytorchmergebot · 2025-05-08T17:01:08Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch-bot bot added the topic: not user facing topic category label May 5, 2025

Aidyn-A requested a review from eqy May 5, 2025 09:28

pytorchbot added the open source label May 5, 2025

eqy reviewed May 6, 2025

View reviewed changes

Aidyn-A changed the title ~~[TEST][ATen][CUDA] Skip row-wise scaled matrix mmultiplication tests on Blackwell~~ [TEST][ATen][CUDA] Skip row-wise scaled matrix mmultiplication tests on sm_12x May 6, 2025

Aidyn-A changed the title ~~[TEST][ATen][CUDA] Skip row-wise scaled matrix mmultiplication tests on sm_12x~~ [TEST][ATen][CUDA] Skip row-wise scaled matrix mmultiplication tests on sm_120+ May 6, 2025

Aidyn-A requested a review from eqy May 6, 2025 10:08

drisspg added module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 6, 2025

Aidyn-A added 2 commits May 7, 2025 09:00

skip rowwise scaled_mm on blackwell

b4a8a17

skip for sm_12x

c02fe40

pytorchmergebot force-pushed the test_matmul_cuda_skip_rowwise_scaled_mm_on_blackwell branch from dec9eab to c02fe40 Compare May 7, 2025 09:00

eqy approved these changes May 7, 2025

View reviewed changes

test/test_matmul_cuda.py Outdated Show resolved Hide resolved

Aidyn-A added 2 commits May 7, 2025 19:16

Use expectedFailure

5012d76

Forgot comment

3a18ffa

eqy approved these changes May 7, 2025

View reviewed changes

Aidyn-A added 2 commits May 8, 2025 17:26

fix xFail

dbdce83

revert skip for SM89OrLater

6aba225

Skylion007 approved these changes May 8, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 8, 2025

pytorchmergebot added the merging label May 8, 2025

pytorchmergebot closed this in 086e2c2 May 8, 2025

pytorchmergebot added Merged and removed merging labels May 8, 2025

[TEST][ATen][CUDA] Skip row-wise scaled matrix mmultiplication tests on sm_120+ #152814

[TEST][ATen][CUDA] Skip row-wise scaled matrix mmultiplication tests on sm_120+ #152814

Uh oh!

Conversation

Aidyn-A commented May 5, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152814

✅ No Failures

Uh oh!

eqy left a comment

Choose a reason for hiding this comment

Uh oh!

Aidyn-A commented May 6, 2025

Uh oh!

Aidyn-A commented May 7, 2025

Uh oh!

pytorchmergebot commented May 7, 2025

Uh oh!

pytorchmergebot commented May 7, 2025

Uh oh!

Skylion007 commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Skylion007 commented May 7, 2025

Uh oh!

Skylion007 commented May 7, 2025

Uh oh!

Aidyn-A commented May 7, 2025

Uh oh!

Aidyn-A commented May 7, 2025

Uh oh!

Uh oh!

Skylion007 May 7, 2025

Choose a reason for hiding this comment

Uh oh!

Skylion007 commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Aidyn-A commented May 8, 2025

Uh oh!

pytorchmergebot commented May 8, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Aidyn-A commented May 5, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented May 5, 2025 •

edited

Loading

Skylion007 commented May 7, 2025 •

edited

Loading

Skylion007 commented May 8, 2025 •

edited

Loading