[TEST][ATen][CUDA] Skip row-wise scaled matrix mmultiplication tests on sm_120+#152814
[TEST][ATen][CUDA] Skip row-wise scaled matrix mmultiplication tests on sm_120+#152814Aidyn-A wants to merge 6 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152814
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 6aba225 with merge base 5796212 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
eqy
left a comment
There was a problem hiding this comment.
Is this still needed after #148421 ?
I'm seeing that on a fresh source build
test_float8_rowwise_scaling_sanity_use_fast_accum_True_cuda
test_float8_rowwise_scaling_sanity_use_fast_accum_False_cuda
test_scaled_mm_vs_emulated_row_wise
all pass
My bad, it is needed for |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
dec9eab to
c02fe40
Compare
|
Hmm, is this because CUTLASS is missing those specializations? Or because we are missing something minor on our side that would unblock support? Like a missing template specialization, overly restrictive dispatch logic, or missing cmake are to build those kernels from SM120? |
|
Doesn't it support it here? Are we missing dispatch logic? |
|
Does just adding SM120 here fix it or is SM100 not compatible with SM120? |
No, it is not as trivial as adding |
|
|
||
| @unittest.skipIf(not PLATFORM_SUPPORTS_FP8 or IS_WINDOWS, f8_msg) | ||
| @unittest.skipIf(not SM89OrLater, "rowwise implementation is currently sm89+ specific") | ||
| @unittest.skipIf(not SM89OrLater or _IS_SM12X, "rowwise implementation is currently sm89-sm90 specific") |
There was a problem hiding this comment.
Should we skip or XFAIL?
If you could post the trace, that could be helpful to figuring out how to enable it. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
The float8 row-wise scaled matmuls are not supported on Blackwell yet. This PR adds skips to those tests to decrease the noise on
sm_120+machines.cc @ptrblck @msaroufim @eqy @jerryzh168