-
Notifications
You must be signed in to change notification settings - Fork 25.5k
[fx graph cache] FxGraphPickler: Remove hack to stabilize device string hashes #138681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: With the fast pickling mode, we don't need the custom hack for replacing device strings in tensors. This was previously needed because, e.g., two strings "cuda" will pickle differently if they are the same object vs. not. Test Plan: The new test fails with fast mode commented out, but succeeds when enabled: `python test/inductor/test_codecache.py -k test_stable_strings` [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138681
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit fefa6f8 with merge base 8aedc64 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where was the "fast pickling mode" added ? I'm not familiar
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…device string hashes" Summary: With the fast pickling mode, we don't need the custom hack for replacing device strings in tensors. This was previously needed because, e.g., two strings "cuda" will pickle differently if they are the same object vs. not. Test Plan: The new test fails with fast mode commented out, but succeeds when enabled: `python test/inductor/test_codecache.py -k test_stable_strings` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@pytorchbot revert -m "Introduced regressions on linux-focal-cuda11.8-py3.10-gcc9` -c nosignal |
❌ 🤖 pytorchbot command failed:
|
@pytorchbot revert -m "Introduced regressions on linux-focal-cuda11.8-py3.10-gcc9" -c nosignal |
@pytorchbot successfully started a revert job. Check the current status here. |
…ice string hashes (#138681)" This reverts commit 6cadf61. Reverted #138681 on behalf of https://github.com/jeanschmidt due to Introduced regressions on linux-focal-cuda11.8-py3.10-gcc9 ([comment](#138681 (comment)))
@masnesral your PR has been successfully reverted. |
@jeanschmidt where can I find the failures? |
I think the failure is not related to this PR as it still shows up in trunk after the revert https://hud.pytorch.org/pytorch/pytorch/commit/36b7135c6ff7ed1be4203968888ba4dd7ddbb3c6. The test was disabled by @kwen2501 few weeks ago in #137771 but the bot closed it earlier today, finding no failure there anymore. Either that the bot is getting wrong info here, or maybe one of the recent distributed commit fixed it but was reverted. Anyway, this can be reland as I have reopen #137771 |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary: In an upcoming change, we need to modify FxGraphCachePickler to behave differently depending on whether the graph has frozen parameters (whether or not we have frozen parameters). To do that, it will be convenient to change FxGraphCachePickler into a regular object instead of a collection of classmethods. Test Plan: unit tests Pull Request resolved: #138682 Approved by: https://github.com/eellison ghstack dependencies: #138681
Summary: Move all the custom `_reduce_*` functions inside the FxGraphCachePickler class. This is mostly a cosmetic change since they're conceptually members of FxGraphCachePickler. But also in an upcoming diff, I'll add a member variable to the class to control how we handle constant tensors, so it will be convenient to be able to query that setting via `self`. I made the analogous changes to AOTAutogradCachePickler for consistency. Test Plan: unit tests Pull Request resolved: #138683 Approved by: https://github.com/eellison ghstack dependencies: #138681, #138682
Summary: In an upcoming change, we need to modify FxGraphCachePickler to behave differently depending on whether the graph has frozen parameters (whether or not we have frozen parameters). To do that, it will be convenient to change FxGraphCachePickler into a regular object instead of a collection of classmethods. Test Plan: unit tests Pull Request resolved: pytorch#138682 Approved by: https://github.com/eellison ghstack dependencies: pytorch#138681
Summary: Move all the custom `_reduce_*` functions inside the FxGraphCachePickler class. This is mostly a cosmetic change since they're conceptually members of FxGraphCachePickler. But also in an upcoming diff, I'll add a member variable to the class to control how we handle constant tensors, so it will be convenient to be able to query that setting via `self`. I made the analogous changes to AOTAutogradCachePickler for consistency. Test Plan: unit tests Pull Request resolved: pytorch#138683 Approved by: https://github.com/eellison ghstack dependencies: pytorch#138681, pytorch#138682
Stack from ghstack (oldest at bottom):
Summary: With the fast pickling mode, we don't need the custom hack for replacing device strings in tensors. This was previously needed because, e.g., two strings "cuda" will pickle differently if they are the same object vs. not.
Test Plan:
The new test fails with fast mode commented out, but succeeds when enabled:
python test/inductor/test_codecache.py -k test_stable_strings
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov