[fx graph cache] FxGraphPickler: Remove hack to stabilize device string hashes #138681

masnesral · 2024-10-23T05:14:41Z

Stack from ghstack (oldest at bottom):

Summary: With the fast pickling mode, we don't need the custom hack for replacing device strings in tensors. This was previously needed because, e.g., two strings "cuda" will pickle differently if they are the same object vs. not.

Test Plan:
The new test fails with fast mode commented out, but succeeds when enabled:
python test/inductor/test_codecache.py -k test_stable_strings

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

Summary: With the fast pickling mode, we don't need the custom hack for replacing device strings in tensors. This was previously needed because, e.g., two strings "cuda" will pickle differently if they are the same object vs. not. Test Plan: The new test fails with fast mode commented out, but succeeds when enabled: `python test/inductor/test_codecache.py -k test_stable_strings` [ghstack-poisoned]

pytorch-bot · 2024-10-23T05:14:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138681

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit fefa6f8 with merge base 8aedc64 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

eellison

Where was the "fast pickling mode" added ? I'm not familiar

oulgen · 2024-10-23T16:21:25Z

@eellison 5ed72ff

we use it to make sure strings are interned the same way

masnesral · 2024-10-24T02:17:05Z

@pytorchbot merge

pytorchmergebot · 2024-10-24T02:18:53Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-10-24T02:19:29Z

Merge failed

Reason: 16 jobs have failed, first few of them are: inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_torchbench, 2, 2, linux.12xlarge), inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_freezing_torchbench, 2, 2, linux.12xlarge), inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_amp_freezing_torchbench, 2, 2, linux.16xlarge.spr), inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 2, 2, linux.12xlarge), inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_aot_inductor_freezing_torchbench, 2, 2, linux.12xlarge)

Details for Dev Infra team

Raised by workflow job

…device string hashes" Summary: With the fast pickling mode, we don't need the custom hack for replacing device strings in tensors. This was previously needed because, e.g., two strings "cuda" will pickle differently if they are the same object vs. not. Test Plan: The new test fails with fast mode commented out, but succeeds when enabled: `python test/inductor/test_codecache.py -k test_stable_strings` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

masnesral · 2024-10-24T14:33:59Z

@pytorchbot merge

pytorchmergebot · 2024-10-24T14:35:44Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-10-24T20:34:23Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

pytorchmergebot · 2024-10-25T15:52:36Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

jeanschmidt · 2024-10-25T22:05:31Z

@pytorchbot revert -m "Introduced regressions on linux-focal-cuda11.8-py3.10-gcc9` -c nosignal

pytorch-bot · 2024-10-25T22:05:33Z

❌ 🤖 pytorchbot command failed:

Got EOF while in a quoted string```
Try `@pytorchbot --help` for more info.

jeanschmidt · 2024-10-25T22:05:56Z

@pytorchbot revert -m "Introduced regressions on linux-focal-cuda11.8-py3.10-gcc9" -c nosignal

pytorchmergebot · 2024-10-25T22:07:18Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

…ice string hashes (#138681)" This reverts commit 6cadf61. Reverted #138681 on behalf of https://github.com/jeanschmidt due to Introduced regressions on linux-focal-cuda11.8-py3.10-gcc9 ([comment](#138681 (comment)))

pytorchmergebot · 2024-10-25T22:07:33Z

@masnesral your PR has been successfully reverted.

masnesral · 2024-10-25T22:12:06Z

@jeanschmidt where can I find the failures?

huydhn · 2024-10-26T00:02:32Z

I think the failure is not related to this PR as it still shows up in trunk after the revert https://hud.pytorch.org/pytorch/pytorch/commit/36b7135c6ff7ed1be4203968888ba4dd7ddbb3c6. The test was disabled by @kwen2501 few weeks ago in #137771 but the bot closed it earlier today, finding no failure there anymore. Either that the bot is getting wrong info here, or maybe one of the recent distributed commit fixed it but was reverted. Anyway, this can be reland as I have reopen #137771

huydhn · 2024-10-26T00:02:46Z

@pytorchbot merge

pytorchmergebot · 2024-10-26T00:04:29Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-10-26T06:03:07Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

masnesral · 2024-10-28T15:21:49Z

@pytorchbot merge

pytorchmergebot · 2024-10-28T15:23:29Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Summary: In an upcoming change, we need to modify FxGraphCachePickler to behave differently depending on whether the graph has frozen parameters (whether or not we have frozen parameters). To do that, it will be convenient to change FxGraphCachePickler into a regular object instead of a collection of classmethods. Test Plan: unit tests Pull Request resolved: #138682 Approved by: https://github.com/eellison ghstack dependencies: #138681

Summary: Move all the custom `_reduce_*` functions inside the FxGraphCachePickler class. This is mostly a cosmetic change since they're conceptually members of FxGraphCachePickler. But also in an upcoming diff, I'll add a member variable to the class to control how we handle constant tensors, so it will be convenient to be able to query that setting via `self`. I made the analogous changes to AOTAutogradCachePickler for consistency. Test Plan: unit tests Pull Request resolved: #138683 Approved by: https://github.com/eellison ghstack dependencies: #138681, #138682

Summary: In an upcoming change, we need to modify FxGraphCachePickler to behave differently depending on whether the graph has frozen parameters (whether or not we have frozen parameters). To do that, it will be convenient to change FxGraphCachePickler into a regular object instead of a collection of classmethods. Test Plan: unit tests Pull Request resolved: pytorch#138682 Approved by: https://github.com/eellison ghstack dependencies: pytorch#138681

Summary: Move all the custom `_reduce_*` functions inside the FxGraphCachePickler class. This is mostly a cosmetic change since they're conceptually members of FxGraphCachePickler. But also in an upcoming diff, I'll add a member variable to the class to control how we handle constant tensors, so it will be convenient to be able to query that setting via `self`. I made the analogous changes to AOTAutogradCachePickler for consistency. Test Plan: unit tests Pull Request resolved: pytorch#138683 Approved by: https://github.com/eellison ghstack dependencies: pytorch#138681, pytorch#138682

pytorch-bot bot added ciflow/inductor module: inductor labels Oct 23, 2024

This was referenced Oct 23, 2024

[fx graph cache] Refactor FxGraphCachePickler #138682

Closed

[fx graph cache] Refactor FxGraphCachePickler, step 2 #138683

Closed

masnesral added the topic: not user facing topic category label Oct 23, 2024

masnesral changed the title ~~[fx graph cache] FxGraphPickler: Remove hack to stabilize device strings~~ [fx graph cache] FxGraphPickler: Remove hack to stabilize device string hashes Oct 23, 2024

masnesral marked this pull request as ready for review October 23, 2024 14:21

masnesral requested a review from bdhirsh as a code owner October 23, 2024 14:21

masnesral requested review from bdhirsh, eellison, jamesjwu and oulgen and removed request for bdhirsh October 23, 2024 14:21

oulgen approved these changes Oct 23, 2024

View reviewed changes

eellison reviewed Oct 23, 2024

View reviewed changes

masnesral mentioned this pull request Oct 24, 2024

[fx graph cache] Support freezing with FX graph caching #136505

Closed

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 24, 2024

pytorchmergebot added the merging label Oct 24, 2024

pytorchmergebot removed the merging label Oct 24, 2024

pytorchmergebot added the merging label Oct 24, 2024

pytorchmergebot closed this in 6cadf61 Oct 25, 2024

pytorchmergebot added Merged and removed merging labels Oct 25, 2024

pytorchmergebot added the Reverted label Oct 25, 2024

pytorchmergebot reopened this Oct 25, 2024

pytorchmergebot added the merging label Oct 26, 2024

pytorchmergebot closed this in ad93357 Oct 28, 2024

pytorchmergebot removed the merging label Oct 28, 2024

lanluo-nvidia mentioned this pull request Oct 30, 2024

Remove numpy version constraint in test requirements pytorch/TensorRT#3264

Merged

github-actions bot deleted the gh/masnesral/130/head branch November 28, 2024 02:12

[fx graph cache] FxGraphPickler: Remove hack to stabilize device string hashes #138681

[fx graph cache] FxGraphPickler: Remove hack to stabilize device string hashes #138681

Uh oh!

Conversation

masnesral commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138681

✅ No Failures

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

oulgen commented Oct 23, 2024

Uh oh!

masnesral commented Oct 24, 2024

Uh oh!

pytorchmergebot commented Oct 24, 2024

Merge started

Uh oh!

pytorchmergebot commented Oct 24, 2024

Merge failed

Uh oh!

masnesral commented Oct 24, 2024

Uh oh!

pytorchmergebot commented Oct 24, 2024

Merge started

Uh oh!

pytorchmergebot commented Oct 24, 2024

Uh oh!

pytorchmergebot commented Oct 25, 2024

Merge started

Uh oh!

jeanschmidt commented Oct 25, 2024

Uh oh!

pytorch-bot bot commented Oct 25, 2024

Uh oh!

jeanschmidt commented Oct 25, 2024

Uh oh!

pytorchmergebot commented Oct 25, 2024

Uh oh!

pytorchmergebot commented Oct 25, 2024

Uh oh!

masnesral commented Oct 25, 2024

Uh oh!

huydhn commented Oct 26, 2024

Uh oh!

huydhn commented Oct 26, 2024

Uh oh!

pytorchmergebot commented Oct 26, 2024

Merge started

Uh oh!

pytorchmergebot commented Oct 26, 2024

Uh oh!

masnesral commented Oct 28, 2024

Uh oh!

pytorchmergebot commented Oct 28, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

masnesral commented Oct 23, 2024 •

edited

Loading

pytorch-bot bot commented Oct 23, 2024 •

edited

Loading