cpp_wrapper: persist autotune example tensors until last use #146706

benjaminglass1 · 2025-02-07T16:08:21Z

Stack from ghstack (oldest at bottom):

Patches over an issue where randomly generated example tensors can cause kernel autotuning to fail, when those tensors would not be possible outputs from previous kernels in the sequence. This fixes a failure in test_torchinductor_opinfo.py when run with compile-time autotuning, test_comprehensive_nanquantile_cuda_float64.

For clarity, the situation triggering this PR looks like kernels A -> BCDE -> F (BCDE is fused), where one of the outputs from A is a boolean tensor describing some of the input data. Previously, we randomly regenerated that boolean tensor and the input data before passing them to BCDE, so that they no longer matched. This caused a tl.device_assert call in BCDE to fail. With this PR, we reuse the random data input to A and the output Boolean tensor, such that they match and pass the device assertion in BCDE.

Fixes #147799.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @amjames @desertfire @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2025-02-07T16:08:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146706

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 1 Pending, 3 Unrelated Failures

As of commit 266b699 with merge base a89bdc0 ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_torchbench, 1, 2, linux.8xlarge.amx) (gh) (trunk failure)
detectron2_fcos_r_50_fpn
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 1, 2, linux.8xlarge.amx) (gh) (trunk failure)
detectron2_fcos_r_50_fpn

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / cuda12.4-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu) (gh) (#149370)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torch/_inductor/codegen/wrapper.py

[ghstack-poisoned]

benjaminglass1 · 2025-02-07T16:44:39Z

Notes to reviewers:

It is currently unclear to me whether this raises the average memory usage of autotuning, and what affects that may have on the output kernels.
This effectively makes autotuning order-dependent, so tuning could not be done in parallel in the general case. We don't currently tune in parallel, but we theoretically could before this PR.
This doesn't handle situations where the inputs to the first kernel are a priori invalid, or where fallback kernels run between the different kernels are responsible for making the inputs to BCDE valid.

This solution feels a bit hacky to me, but it solves the test breakage prompting this PR and seems strictly more correct than what we had before, although it doesn't solve every case.

[ghstack-poisoned]

benjaminglass1 · 2025-03-15T18:33:47Z

@eellison @desertfire I finally had time to revisit this, and used the delayed line resolution idea.

WRT the concerns about memory usage when keeping tensors, I've spent some time thinking about it, and I've reached the conclusion that we should be fine. Since the tensors we're persisting are the same size as tensors that would be persisted when running the final model, we shouldn't fail to run based on that issue if the model itself is capable of running. I think this is ready to merge, with approval.

[ghstack-poisoned]

Patches over an issue where randomly generated example tensors can cause kernel autotuning to fail, when those tensors would not be possible outputs from previous kernels in the sequence. This fixes a failure in `test_torchinductor_opinfo.py`, `test_comprehensive_nanquantile_cuda_float64`. Note to reviewers: it is currently unclear to me whether this raises the average memory usage of autotuning, and what affects that may have on the output kernels. ghstack-source-id: fb56d6c Pull Request resolved: pytorch/pytorch#146706

[ghstack-poisoned]

pytorchmergebot · 2025-03-24T23:59:28Z

Starting merge as part of PR stack under #149350

pytorchmergebot · 2025-03-25T00:20:17Z

Starting merge as part of PR stack under #149350

[ghstack-poisoned]

pytorchmergebot · 2025-03-25T13:59:13Z

Starting merge as part of PR stack under #149350

Pull Request resolved: #147225 Approved by: https://github.com/desertfire ghstack dependencies: #146706

… RAIIPyObject interface (#149350) Add includes for torch.device, torch.dtype, torch.layout, and torch.memory_format to the cpp_wrapper common header, so that they get precompiled. Additionally, add move constructors and operator bool to RAIIPyObject. Closes #142005. Pull Request resolved: #149350 Approved by: https://github.com/desertfire ghstack dependencies: #146706, #147225

desertfire · 2025-04-02T14:59:06Z

xref #150522

@benjaminglass1 , I remember you said this one fixed some unit test. Is the test included in your next PR in this stack?

benjaminglass1 · 2025-04-03T01:25:13Z

@desertfire No, but I will add that. Generally, we need to find a way to finish enabling all the cpp_wrapper tests anyway. We're close to being able to, and possibly all it would take is a longer timeout now.

…#146706) Patches over an issue where randomly generated example tensors can cause kernel autotuning to fail, when those tensors would not be possible outputs from previous kernels in the sequence. This fixes a failure in `test_torchinductor_opinfo.py` when run with compile-time autotuning, `test_comprehensive_nanquantile_cuda_float64`. For clarity, the situation triggering this PR looks like kernels `A -> BCDE -> F` (`BCDE` is fused), where one of the outputs from `A` is a boolean tensor describing some of the input data. Previously, we randomly regenerated that boolean tensor and the input data before passing them to `BCDE`, so that they no longer matched. This caused a `tl.device_assert` call in `BCDE` to fail. With this PR, we reuse the random data input to `A` and the output Boolean tensor, such that they match and pass the device assertion in `BCDE`. Fixes pytorch#147799. Pull Request resolved: pytorch#146706 Approved by: https://github.com/desertfire

Pull Request resolved: pytorch#147225 Approved by: https://github.com/desertfire ghstack dependencies: pytorch#146706

… RAIIPyObject interface (pytorch#149350) Add includes for torch.device, torch.dtype, torch.layout, and torch.memory_format to the cpp_wrapper common header, so that they get precompiled. Additionally, add move constructors and operator bool to RAIIPyObject. Closes pytorch#142005. Pull Request resolved: pytorch#149350 Approved by: https://github.com/desertfire ghstack dependencies: pytorch#146706, pytorch#147225

Patches over an issue where randomly generated example tensors can cause kernel autotuning to fail, when those tensors would not be possible outputs from previous kernels in the sequence. This fixes a failure in `test_torchinductor_opinfo.py`, `test_comprehensive_nanquantile_cuda_float64`. Note to reviewers: it is currently unclear to me whether this raises the average memory usage of autotuning, and what affects that may have on the output kernels. ghstack-source-id: 2e7c452 Pull Request resolved: pytorch/pytorch#146706

Update

a56a4ba

[ghstack-poisoned]

benjaminglass1 mentioned this pull request Feb 5, 2025

cpp_wrapper: Precompile device-specific header files #144002

Closed

benjaminglass1 mentioned this pull request Feb 7, 2025

codecache: Remove cpp_prefix.h duplication per build, then precompile it #144293

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels Feb 7, 2025

benjaminglass1 self-assigned this Feb 7, 2025

benjaminglass1 requested a review from amjames February 7, 2025 16:09

benjaminglass1 added the topic: not user facing topic category label Feb 7, 2025

pytorch deleted a comment from github-actions bot Feb 7, 2025

benjaminglass1 commented Feb 7, 2025

View reviewed changes

torch/_inductor/codegen/wrapper.py Outdated Show resolved Hide resolved

pytorchbot added the open source label Feb 7, 2025

Update

bd89172

[ghstack-poisoned]

benjaminglass1 requested review from desertfire and eellison and removed request for amjames February 7, 2025 16:44

benjaminglass1 marked this pull request as ready for review February 7, 2025 16:45

benjaminglass1 added 2 commits February 10, 2025 17:16

Update

391e65c

[ghstack-poisoned]

Update

b09440d

[ghstack-poisoned]

benjaminglass1 mentioned this pull request Feb 11, 2025

cpp_wrapper: Precompile device-specific header files #146928

Closed

benjaminglass1 added 4 commits February 11, 2025 21:55

Update

913e8d2

[ghstack-poisoned]

Update

2507809

[ghstack-poisoned]

Update

e1958dd

[ghstack-poisoned]

Update

e508362

[ghstack-poisoned]

Update

b41a6dc

[ghstack-poisoned]

eellison mentioned this pull request Mar 12, 2025

[Inductor] Use real input to autotune user defined triton kernels #148131

Closed

benjaminglass1 added 2 commits March 13, 2025 17:29

Update

b41df2c

[ghstack-poisoned]

Update

3b50b58

[ghstack-poisoned]

benjaminglass1 marked this pull request as ready for review March 15, 2025 18:30

benjaminglass1 added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 15, 2025

benjaminglass1 added 2 commits March 17, 2025 18:53

Update

9843a80

[ghstack-poisoned]

Update

6d84898

[ghstack-poisoned]

benjaminglass1 mentioned this pull request Mar 17, 2025

cpp_wrapper: precompile a few more commonly used headers, and improve RAIIPyObject interface #149350

Closed

Update

2db6678

[ghstack-poisoned]

desertfire approved these changes Mar 24, 2025

View reviewed changes

Update

eb4b792

[ghstack-poisoned]

Update

266b699

[ghstack-poisoned]

pytorchmergebot added the Merged label Mar 25, 2025

pytorchmergebot closed this in 0f1aaeb Mar 25, 2025

pytorchmergebot pushed a commit that referenced this pull request Mar 25, 2025

cpp_wrapper: Fix even more tests (#147225)

62d351a

Pull Request resolved: #147225 Approved by: https://github.com/desertfire ghstack dependencies: #146706

amathewc pushed a commit to amathewc/pytorch that referenced this pull request Apr 17, 2025

cpp_wrapper: Fix even more tests (pytorch#147225)

0badac4

Pull Request resolved: pytorch#147225 Approved by: https://github.com/desertfire ghstack dependencies: pytorch#146706

github-actions bot deleted the gh/benjaminglass1/67/head branch May 12, 2025 02:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cpp_wrapper: persist autotune example tensors until last use #146706

cpp_wrapper: persist autotune example tensors until last use #146706

benjaminglass1 commented Feb 7, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

benjaminglass1 commented Feb 7, 2025

Uh oh!

benjaminglass1 commented Mar 15, 2025

Uh oh!

pytorchmergebot commented Mar 24, 2025

Uh oh!

pytorchmergebot commented Mar 25, 2025

Uh oh!

pytorchmergebot commented Mar 25, 2025

Uh oh!

desertfire commented Apr 2, 2025

Uh oh!

benjaminglass1 commented Apr 3, 2025

Uh oh!

Uh oh!

cpp_wrapper: persist autotune example tensors until last use #146706

cpp_wrapper: persist autotune example tensors until last use #146706

Conversation

benjaminglass1 commented Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146706

⏳ 1 Pending, 3 Unrelated Failures

Uh oh!

Uh oh!

benjaminglass1 commented Feb 7, 2025

Uh oh!

benjaminglass1 commented Mar 15, 2025

Uh oh!

pytorchmergebot commented Mar 24, 2025

Uh oh!

pytorchmergebot commented Mar 25, 2025

Uh oh!

pytorchmergebot commented Mar 25, 2025

Uh oh!

desertfire commented Apr 2, 2025

Uh oh!

benjaminglass1 commented Apr 3, 2025

Uh oh!

Uh oh!

benjaminglass1 commented Feb 7, 2025 •

edited

Loading

pytorch-bot bot commented Feb 7, 2025 •

edited

Loading