-
Notifications
You must be signed in to change notification settings - Fork 24.1k
[TEST][ATen][CUDA] Skip row-wise scaled matrix mmultiplication tests on sm_120+ #152814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TEST][ATen][CUDA] Skip row-wise scaled matrix mmultiplication tests on sm_120+ #152814
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152814
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 6aba225 with merge base 5796212 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed after #148421 ?
I'm seeing that on a fresh source build
test_float8_rowwise_scaling_sanity_use_fast_accum_True_cuda
test_float8_rowwise_scaling_sanity_use_fast_accum_False_cuda
test_scaled_mm_vs_emulated_row_wise
all pass
My bad, it is needed for |
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Successfully rebased |
dec9eab
to
c02fe40
Compare
Hmm, is this because CUTLASS is missing those specializations? Or because we are missing something minor on our side that would unblock support? Like a missing template specialization, overly restrictive dispatch logic, or missing cmake are to build those kernels from SM120? |
Doesn't it support it here? Are we missing dispatch logic?
|
Does just adding SM120 here fix it or is SM100 not compatible with SM120?
|
No, it is not as trivial as adding |
test/test_matmul_cuda.py
Outdated
@@ -1013,7 +1015,7 @@ def test_float8_scale_fast_accum(self, device) -> None: | |||
self.assertEqual(out_fp8, out_fp8_s) | |||
|
|||
@unittest.skipIf(not PLATFORM_SUPPORTS_FP8 or IS_WINDOWS, f8_msg) | |||
@unittest.skipIf(not SM89OrLater, "rowwise implementation is currently sm89+ specific") | |||
@unittest.skipIf(not SM89OrLater or _IS_SM12X, "rowwise implementation is currently sm89-sm90 specific") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we skip or XFAIL?
If you could post the trace, that could be helpful to figuring out how to enable it. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
The float8 row-wise scaled matmuls are not supported on Blackwell yet. This PR adds skips to those tests to decrease the noise on
sm_120+
machines.cc @ptrblck @msaroufim @eqy @jerryzh168