[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for `sm90`, `sm100` #149282

eqy · 2025-03-16T21:09:20Z

cleanup tuple/tensor boilerplate in cuDNN SDPA, preparation for nested/ragged tensor backward

pytorch-bot · 2025-03-16T21:09:24Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149282

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 15 New Failures, 1 Unrelated Failure

As of commit 7b6fd2d with merge base b027cb8 ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner-clang / linux-job (gh)
>>> Lint for aten/src/ATen/native/cudnn/MHA.cpp:
Lint / lintrunner-noclang / linux-job (gh)
>>> Lint for aten/src/ATen/native/transformers/cuda/attention_backward.cu:
pull / cuda12.4-py3.10-gcc9-sm75 / build (gh)
/var/lib/jenkins/workspace/aten/src/ATen/native/cudnn/MHA.cpp:1117:19: error: moving a local object in a return statement prevents copy elision [-Werror=pessimizing-move]
pull / linux-focal-cuda11.8-py3.10-gcc9 / build (gh)
/var/lib/jenkins/workspace/aten/src/ATen/native/cudnn/MHA.cpp:1117:19: error: moving a local object in a return statement prevents copy elision [-Werror=pessimizing-move]
pull / linux-focal-cuda12.6-py3.10-gcc11 / build (gh)
/var/lib/jenkins/workspace/aten/src/ATen/native/cudnn/MHA.cpp:1117:19: error: moving a local object in a return statement prevents copy elision [-Werror=pessimizing-move]
pull / linux-focal-cuda12.6-py3.10-gcc11-sm89 / build (gh)
/var/lib/jenkins/workspace/aten/src/ATen/native/cudnn/MHA.cpp:1117:19: error: moving a local object in a return statement prevents copy elision [-Werror=pessimizing-move]
pull / linux-focal-py3.13-clang10 / test (crossref, 1, 2, ephemeral.linux.2xlarge) (gh)
test_decomp.py::HasDecompTest::test_has_decomposition
pull / linux-focal-py3.13-clang10 / test (default, 1, 5, ephemeral.linux.4xlarge) (gh)
test_decomp.py::HasDecompTest::test_has_decomposition
pull / linux-focal-py3.13-clang10 / test (dynamo_wrapped, 2, 3, ephemeral.linux.2xlarge) (gh)
test_decomp.py::HasDecompTest::test_has_decomposition
pull / linux-focal-py3.9-clang10 / test (crossref, 1, 2, ephemeral.linux.2xlarge) (gh)
test_decomp.py::HasDecompTest::test_has_decomposition
pull / linux-focal-py3.9-clang10 / test (default, 2, 5, ephemeral.linux.4xlarge) (gh)
test_decomp.py::HasDecompTest::test_has_decomposition
pull / linux-focal-py3.9-clang10 / test (dynamo_wrapped, 3, 3, ephemeral.linux.2xlarge) (gh)
test_decomp.py::HasDecompTest::test_has_decomposition
pull / linux-jammy-cuda11.8-cudnn9-py3.9-clang12 / build (gh)
../aten/src/ATen/native/cudnn/MHA.cpp:1117:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]
pull / linux-jammy-py3.10-clang15-asan / test (default, 5, 6, ephemeral.linux.4xlarge) (gh)
test_decomp.py::HasDecompTest::test_has_decomposition
pull / linux-jammy-py3.9-gcc11 / test (default, 4, 5, ephemeral.linux.2xlarge) (gh)
test_decomp.py::HasDecompTest::test_has_decomposition

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / unstable-linux-focal-cuda12.6-py3.10-gcc11-sm89-xfail / build (gh)
/var/lib/jenkins/workspace/aten/src/ATen/native/cudnn/MHA.cpp:1117:19: error: moving a local object in a return statement prevents copy elision [-Werror=pessimizing-move]

This comment was automatically generated by Dr. CI and updates every 15 minutes.

eqy · 2025-03-16T21:55:38Z

@pytorchmergebot rebase

pytorchmergebot · 2025-03-16T21:57:06Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-03-16T21:57:08Z

Tried to rebase and push PR #149282, but it was already up to date. Try rebasing against main by issuing:
@pytorchbot rebase -b main

eqy · 2025-03-17T16:55:32Z

@pytorchmergebot rebase

pytorchmergebot · 2025-03-17T16:56:59Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-03-17T16:57:03Z

Successfully rebased cudnnsdparefactor onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout cudnnsdparefactor && git pull --rebase)

linux-foundation-easycla · 2025-03-17T23:42:38Z

✅ login: eqy (f9dc21b, 8e8a43e, ae3f221, 6ecf88b, 546fc90, 551e58e, a18a415, abd3912, 7b6fd2d, 0baac8a, 6f690ba, 8109104, 7a47cbb, d215caf, 0168ff5, a1107e2)
❌ The email address for the commit (501eea1) is not linked to the GitHub account, preventing the EasyCLA check. Consult this Help Article and GitHub Help to resolve. (To view the commit's email address, add .patch at the end of this PR page's URL.) For further assistance with EasyCLA, please submit a support request ticket.

Skylion007 · 2025-03-18T16:38:01Z

aten/src/ATen/native/cudnn/MHA.cpp

  }
  auto workspace_size = mha_graph->get_workspace_size();
  auto workspace_ptr =
      c10::cuda::CUDACachingAllocator::get()->allocate(workspace_size);
  TORCH_CHECK(
      mha_graph->execute(handle, variant_pack, workspace_ptr.get()).is_good());
-  mhagraphcache.update(key, graph_and_tensors_values);
+  mhagraphcache.update(key, mha_graph);


The update method for mhagraphcache should probably use perfect forward up where the update method is defined instead of an lref. And throughout the file should be to remove extra copies.

Suggested change

mhagraphcache.update(key, mha_graph);

mhagraphcache.update(key, std::move(mha_graph));

eqy added open source topic: not user facing topic category module: sdpa All things related to torch.nn.functional.scaled_dot_product_attentiion labels Mar 16, 2025

eqy requested a review from syed-ahmed as a code owner March 16, 2025 21:09

eqy changed the title ~~[cuDNN][SDPA] cuDNN SDPA refactor/cleanup~~ [WIP][cuDNN][SDPA] cuDNN SDPA refactor/cleanup Mar 17, 2025

pytorch deleted a comment from pytorch-bot bot Mar 17, 2025

pytorchmergebot force-pushed the cudnnsdparefactor branch from ac66884 to f7c76b8 Compare March 17, 2025 16:57

Skylion007 reviewed Mar 18, 2025

View reviewed changes

eqy force-pushed the cudnnsdparefactor branch from 3848e20 to bd4432a Compare April 7, 2025 23:31

eqy requested review from albanD and soulitzer as code owners April 15, 2025 00:44

eqy changed the title ~~[WIP][cuDNN][SDPA] cuDNN SDPA refactor/cleanup~~ [cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for sm90, sm100 Apr 28, 2025

eqy requested review from drisspg and jbschlosser April 28, 2025 21:58

jerryzh168 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 29, 2025

eqy and others added 7 commits April 30, 2025 18:02

check in

6ecf88b

add cudnn_attention_backward native function

ae3f221

update

8e8a43e

check in

0168ff5

lint

501eea1

wip

546fc90

wip

8109104

eqy added 9 commits April 30, 2025 18:03

wip

f9dc21b

check in

a18a415

check in

6f690ba

temp check in

d215caf

wip

a1107e2

add missing RAGGED O offset

551e58e

fix ragged offset tensor and cleanup

abd3912

update test

7a47cbb

lint

0baac8a

eqy force-pushed the cudnnsdparefactor branch from b6a75a5 to 0baac8a Compare April 30, 2025 18:42

try dropout cond

7b6fd2d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for `sm90`, `sm100` #149282

[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for `sm90`, `sm100` #149282

eqy commented Mar 16, 2025

pytorch-bot bot commented Mar 16, 2025 •

edited

Loading

eqy commented Mar 16, 2025

pytorchmergebot commented Mar 16, 2025

pytorchmergebot commented Mar 16, 2025

eqy commented Mar 17, 2025

pytorchmergebot commented Mar 17, 2025

pytorchmergebot commented Mar 17, 2025

linux-foundation-easycla bot commented Mar 17, 2025 •

edited

Loading

Skylion007 Mar 18, 2025 •

edited

Loading

	mhagraphcache.update(key, mha_graph);
	mhagraphcache.update(key, std::move(mha_graph));

[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for sm90, sm100 #149282

Are you sure you want to change the base?

[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for sm90, sm100 #149282

Conversation

eqy commented Mar 16, 2025

pytorch-bot bot commented Mar 16, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149282

❌ 15 New Failures, 1 Unrelated Failure

eqy commented Mar 16, 2025

pytorchmergebot commented Mar 16, 2025

pytorchmergebot commented Mar 16, 2025

eqy commented Mar 17, 2025

pytorchmergebot commented Mar 17, 2025

pytorchmergebot commented Mar 17, 2025

linux-foundation-easycla bot commented Mar 17, 2025 • edited Loading

Skylion007 Mar 18, 2025 • edited Loading

Choose a reason for hiding this comment

[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for `sm90`, `sm100` #149282

[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for `sm90`, `sm100` #149282

pytorch-bot bot commented Mar 16, 2025 •

edited

Loading

linux-foundation-easycla bot commented Mar 17, 2025 •

edited

Loading

Skylion007 Mar 18, 2025 •

edited

Loading