Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

masnesral
Copy link
Contributor

@masnesral masnesral commented Oct 23, 2024

Stack from ghstack (oldest at bottom):

Summary: With the fast pickling mode, we don't need the custom hack for replacing device strings in tensors. This was previously needed because, e.g., two strings "cuda" will pickle differently if they are the same object vs. not.

Test Plan:
The new test fails with fast mode commented out, but succeeds when enabled:
python test/inductor/test_codecache.py -k test_stable_strings

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

Summary: With the fast pickling mode, we don't need the custom hack for replacing device strings in tensors. This was previously needed because, e.g., two strings "cuda" will pickle differently if they are the same object vs. not.

Test Plan:
The new test fails with fast mode commented out, but succeeds when enabled:
`python test/inductor/test_codecache.py -k test_stable_strings`

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Oct 23, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138681

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit fefa6f8 with merge base 8aedc64 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@masnesral masnesral added the topic: not user facing topic category label Oct 23, 2024
@masnesral masnesral changed the title [fx graph cache] FxGraphPickler: Remove hack to stabilize device strings [fx graph cache] FxGraphPickler: Remove hack to stabilize device string hashes Oct 23, 2024
@masnesral masnesral marked this pull request as ready for review October 23, 2024 14:21
@masnesral masnesral requested a review from bdhirsh as a code owner October 23, 2024 14:21
@masnesral masnesral requested review from bdhirsh, eellison, jamesjwu and oulgen and removed request for bdhirsh October 23, 2024 14:21
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where was the "fast pickling mode" added ? I'm not familiar

@oulgen
Copy link
Contributor

oulgen commented Oct 23, 2024

@eellison 5ed72ff

we use it to make sure strings are interned the same way

@masnesral
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 24, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

…device string hashes"

Summary: With the fast pickling mode, we don't need the custom hack for replacing device strings in tensors. This was previously needed because, e.g., two strings "cuda" will pickle differently if they are the same object vs. not.

Test Plan:
The new test fails with fast mode commented out, but succeeds when enabled:
`python test/inductor/test_codecache.py -k test_stable_strings`

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
@masnesral
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@jeanschmidt
Copy link
Contributor

@pytorchbot revert -m "Introduced regressions on linux-focal-cuda11.8-py3.10-gcc9` -c nosignal

Copy link

pytorch-bot bot commented Oct 25, 2024

❌ 🤖 pytorchbot command failed:

Got EOF while in a quoted string```
Try `@pytorchbot --help` for more info.

@jeanschmidt
Copy link
Contributor

@pytorchbot revert -m "Introduced regressions on linux-focal-cuda11.8-py3.10-gcc9" -c nosignal

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Oct 25, 2024
…ice string hashes (#138681)"

This reverts commit 6cadf61.

Reverted #138681 on behalf of https://github.com/jeanschmidt due to Introduced regressions on linux-focal-cuda11.8-py3.10-gcc9 ([comment](#138681 (comment)))
@pytorchmergebot
Copy link
Collaborator

@masnesral your PR has been successfully reverted.

@masnesral
Copy link
Contributor Author

@jeanschmidt where can I find the failures?

@huydhn
Copy link
Contributor

huydhn commented Oct 26, 2024

I think the failure is not related to this PR as it still shows up in trunk after the revert https://hud.pytorch.org/pytorch/pytorch/commit/36b7135c6ff7ed1be4203968888ba4dd7ddbb3c6. The test was disabled by @kwen2501 few weeks ago in #137771 but the bot closed it earlier today, finding no failure there anymore. Either that the bot is getting wrong info here, or maybe one of the recent distributed commit fixed it but was reverted. Anyway, this can be reland as I have reopen #137771

@huydhn
Copy link
Contributor

huydhn commented Oct 26, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@masnesral
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Oct 31, 2024
Summary: In an upcoming change, we need to modify FxGraphCachePickler to behave differently depending on whether the graph has frozen parameters (whether or not we have frozen parameters). To do that, it will be convenient to change FxGraphCachePickler into a regular object instead of a collection of classmethods.

Test Plan: unit tests

Pull Request resolved: #138682
Approved by: https://github.com/eellison
ghstack dependencies: #138681
pytorchmergebot pushed a commit that referenced this pull request Oct 31, 2024
Summary: Move all the custom `_reduce_*` functions inside the FxGraphCachePickler class. This is mostly a cosmetic change since they're conceptually members of FxGraphCachePickler. But also in an upcoming diff, I'll add a member variable to the class to control how we handle constant tensors, so it will be convenient to be able to query that setting via `self`. I made the analogous changes to AOTAutogradCachePickler for consistency.

Test Plan: unit tests

Pull Request resolved: #138683
Approved by: https://github.com/eellison
ghstack dependencies: #138681, #138682
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
Summary: In an upcoming change, we need to modify FxGraphCachePickler to behave differently depending on whether the graph has frozen parameters (whether or not we have frozen parameters). To do that, it will be convenient to change FxGraphCachePickler into a regular object instead of a collection of classmethods.

Test Plan: unit tests

Pull Request resolved: pytorch#138682
Approved by: https://github.com/eellison
ghstack dependencies: pytorch#138681
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
Summary: Move all the custom `_reduce_*` functions inside the FxGraphCachePickler class. This is mostly a cosmetic change since they're conceptually members of FxGraphCachePickler. But also in an upcoming diff, I'll add a member variable to the class to control how we handle constant tensors, so it will be convenient to be able to query that setting via `self`. I made the analogous changes to AOTAutogradCachePickler for consistency.

Test Plan: unit tests

Pull Request resolved: pytorch#138683
Approved by: https://github.com/eellison
ghstack dependencies: pytorch#138681, pytorch#138682
@github-actions github-actions bot deleted the gh/masnesral/130/head branch November 28, 2024 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants