-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Refactor autocast C++ APIs to be device-agnostic #124359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/124359
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 927c0b4 with merge base 7e095be ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@albanD This PR is working in progress. Is this PR reasonable as it is BC-breaking? If it is reasonable, I will continue finishing it. Could you give me your input? Thanks. |
[ghstack-poisoned]
cc mcarilli ptrblck leslie-fang-intel jgong5 [ghstack-poisoned]
cc mcarilli ptrblck leslie-fang-intel jgong5 [ghstack-poisoned]
cc mcarilli ptrblck leslie-fang-intel jgong5 [ghstack-poisoned]
I definitely love the direction!
|
cc mcarilli ptrblck leslie-fang-intel jgong5 [ghstack-poisoned]
# Motivation This PR aims to refactor autocast APIs to be device-agnostic and remove the device-specific autocast APIs. In C++ side, `is_enabled()` -> `is_enabled(device_type)`. `set_enabled(new_enabled)` -> `set_enabled(device_type, new_enabled)`. `get_autocast_dtype()` -> `get_autocast_dtype(device_type)` `set_autocast_dtype(dtype)` -> `set_autocast_dtype(device_type, dtype)` ~~`is_cpu_enabled`~~ ~~`set_cpu_enabled`~~ ~~`get_autocast_cpu_dtype`~~ ~~`set_autocast_cpu_dtype`~~ ~~`is_xpu_enabled`~~ ~~`set_xpu_enabled`~~ ~~`get_autocast_xpu_dtype`~~ ~~`set_autocast_xpu_dtype`~~ ~~`is_ipu_enabled`~~ ~~`set_ipu_enabled`~~ ~~`get_autocast_ipu_dtype`~~ ~~`set_autocast_ipu_dtype`~~ ~~`is_hpu_enabled`~~ ~~`set_hpu_enabled`~~ ~~`get_autocast_hpu_dtype`~~ ~~`set_autocast_hpu_dtype`~~ ~~`is_xla_enabled`~~ ~~`set_xla_enabled`~~ ~~`get_autocast_xla_dtype`~~ ~~`set_autocast_xla_dtype`~~ ~~`is_privateuseone_enabled`~~ ~~`set_privateuseone_enabled`~~ ~~`get_autocast_privateuseone_dtype`~~ ~~`set_autocast_privateuseone_dtype`~~ In Python side, provide 4 generic autocast APIs: -`torch.is_autocast_enabled(device_type)` -`torch.set_autocast_enabled(device_type, new_enabled)` -`torch.get_autocast_dtype(device_type)` -`torch.set_autocast_dtype(device_type, dtype)` cc mcarilli ptrblck leslie-fang-intel jgong5 [ghstack-poisoned]
Thanks for your elaboration. I capture your point. |
torch.is_autocast_cpu_enabled(), | ||
torch.get_autocast_gpu_dtype(), | ||
torch.get_autocast_cpu_dtype(), | ||
torch.is_autocast_enabled("cuda"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix UT warnings check issue on xla/test/test_dynamo.py::test_all_cpu_tensor
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
# Motivation This PR aims to refactor autocast **C++** APIs to be device-agnostic and deprecate the device-specific autocast **C++** APIs. In C++ side, - `is_enabled()` -> `is_enabled(device_type)`. - `set_enabled(new_enabled)` -> `set_enabled(device_type, new_enabled)`. - `get_autocast_dtype()` -> `get_autocast_dtype(device_type)` - `set_autocast_dtype(dtype)` -> `set_autocast_dtype(device_type, dtype)` These following C++ APIs are deprecated and should be removed in PyTorch 2.5 - `is_cpu_enabled` - `set_cpu_enabled` - `get_autocast_cpu_dtype` - `set_autocast_cpu_dtype` - `is_xpu_enabled` - `set_xpu_enabled` - `get_autocast_xpu_dtype` - `set_autocast_xpu_dtype` - `is_ipu_enabled` - `set_ipu_enabled` - `get_autocast_ipu_dtype` - `set_autocast_ipu_dtype` - `is_hpu_enabled` - `set_hpu_enabled` - `get_autocast_hpu_dtype` - `set_autocast_hpu_dtype` - `is_xla_enabled` - `set_xla_enabled` - `get_autocast_xla_dtype` - `set_autocast_xla_dtype` - `is_privateuseone_enabled` - `set_privateuseone_enabled` - `get_autocast_privateuseone_dtype` - `set_autocast_privateuseone_dtype` In Python side, provide 4 generic autocast APIs: - `torch.is_autocast_enabled(device_type)` - `torch.set_autocast_enabled(device_type, new_enabled)` - `torch.get_autocast_dtype(device_type)` - `torch.set_autocast_dtype(device_type, dtype)` # Additional Context We will submit another PR to refactor autocast **Python** APIs based on this PR. cc mcarilli ptrblck leslie-fang-intel jgong5 voznesenskym penguinwu EikanWang Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]
# Motivation Refactor autocast usage scenario in `torch/amp/autocast_mode.py` and `torch/utils/checkpoint.py` to fix the bug - convention conflict between `torch.xxx.get_autocast_xxx_dtype` defined in `autocast_mode.py` and `torch.xxx.get_autocast_dtype` defined in `checkpoint.py`. # Solution Use device-agnostic APIs like `torch.get_autocast_dtype`, ..., instead. Pull Request resolved: #124479 Approved by: https://github.com/jgong5, https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/albanD ghstack dependencies: #124359
# Motivation This PR aims to refactor autocast **C++** APIs to be device-agnostic and deprecate the device-specific autocast **C++** APIs. In C++ side, - `is_enabled()` -> `is_enabled(device_type)`. - `set_enabled(new_enabled)` -> `set_enabled(device_type, new_enabled)`. - `get_autocast_dtype()` -> `get_autocast_dtype(device_type)` - `set_autocast_dtype(dtype)` -> `set_autocast_dtype(device_type, dtype)` These following C++ APIs are deprecated and should be removed in PyTorch 2.5 - `is_cpu_enabled` - `set_cpu_enabled` - `get_autocast_cpu_dtype` - `set_autocast_cpu_dtype` - `is_xpu_enabled` - `set_xpu_enabled` - `get_autocast_xpu_dtype` - `set_autocast_xpu_dtype` - `is_ipu_enabled` - `set_ipu_enabled` - `get_autocast_ipu_dtype` - `set_autocast_ipu_dtype` - `is_hpu_enabled` - `set_hpu_enabled` - `get_autocast_hpu_dtype` - `set_autocast_hpu_dtype` - `is_xla_enabled` - `set_xla_enabled` - `get_autocast_xla_dtype` - `set_autocast_xla_dtype` - `is_privateuseone_enabled` - `set_privateuseone_enabled` - `get_autocast_privateuseone_dtype` - `set_autocast_privateuseone_dtype` In Python side, provide 4 generic autocast APIs: - `torch.is_autocast_enabled(device_type)` - `torch.set_autocast_enabled(device_type, new_enabled)` - `torch.get_autocast_dtype(device_type)` - `torch.set_autocast_dtype(device_type, dtype)` # Additional Context We will submit another PR to refactor autocast **Python** APIs based on this PR. Pull Request resolved: #124359 Approved by: https://github.com/jgong5, https://github.com/albanD
# Motivation Refactor autocast usage scenario in `torch/amp/autocast_mode.py` and `torch/utils/checkpoint.py` to fix the bug - convention conflict between `torch.xxx.get_autocast_xxx_dtype` defined in `autocast_mode.py` and `torch.xxx.get_autocast_dtype` defined in `checkpoint.py`. # Solution Use device-agnostic APIs like `torch.get_autocast_dtype`, ..., instead. Pull Request resolved: #124479 Approved by: https://github.com/jgong5, https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/albanD ghstack dependencies: #124359
# Motivation This PR aims to refactor autocast **C++** APIs to be device-agnostic and deprecate the device-specific autocast **C++** APIs. In C++ side, - `is_enabled()` -> `is_enabled(device_type)`. - `set_enabled(new_enabled)` -> `set_enabled(device_type, new_enabled)`. - `get_autocast_dtype()` -> `get_autocast_dtype(device_type)` - `set_autocast_dtype(dtype)` -> `set_autocast_dtype(device_type, dtype)` These following C++ APIs are deprecated and should be removed in PyTorch 2.5 - `is_cpu_enabled` - `set_cpu_enabled` - `get_autocast_cpu_dtype` - `set_autocast_cpu_dtype` - `is_xpu_enabled` - `set_xpu_enabled` - `get_autocast_xpu_dtype` - `set_autocast_xpu_dtype` - `is_ipu_enabled` - `set_ipu_enabled` - `get_autocast_ipu_dtype` - `set_autocast_ipu_dtype` - `is_hpu_enabled` - `set_hpu_enabled` - `get_autocast_hpu_dtype` - `set_autocast_hpu_dtype` - `is_xla_enabled` - `set_xla_enabled` - `get_autocast_xla_dtype` - `set_autocast_xla_dtype` - `is_privateuseone_enabled` - `set_privateuseone_enabled` - `get_autocast_privateuseone_dtype` - `set_autocast_privateuseone_dtype` In Python side, provide 4 generic autocast APIs: - `torch.is_autocast_enabled(device_type)` - `torch.set_autocast_enabled(device_type, new_enabled)` - `torch.get_autocast_dtype(device_type)` - `torch.set_autocast_dtype(device_type, dtype)` # Additional Context We will submit another PR to refactor autocast **Python** APIs based on this PR. Pull Request resolved: #124359 Approved by: https://github.com/jgong5, https://github.com/albanD
Refactor autocast usage scenario in `torch/amp/autocast_mode.py` and `torch/utils/checkpoint.py` to fix the bug - convention conflict between `torch.xxx.get_autocast_xxx_dtype` defined in `autocast_mode.py` and `torch.xxx.get_autocast_dtype` defined in `checkpoint.py`. Use device-agnostic APIs like `torch.get_autocast_dtype`, ..., instead. Pull Request resolved: #124479 Approved by: https://github.com/jgong5, https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/albanD ghstack dependencies: #124359
* Fix ref leak in `dtype.to_complex()`/`to_real()` (#125154) By using `Py_NewRef` Also, wrap `THPDtype_to_real`/`THPDtype_to_complex` calls with `HANDLE_TH_ERRORS` Add regression test for the above issues, by calling to_complex for integral dtypes, that raises an exception and by preserving reference count to the same to_complex/to_real call to detect if leak is happeneing. Replace ```cpp auto dtype = (PyObject*)torch::getTHPDtype(current_dtype); Py_INCREF(dtype); return dtype; ``` with a more compact/streamlined equivalent ```cpp return Py_NewRef(torch::getTHPDtype(current_dtype)); ``` Fixes #124868 Pull Request resolved: #125154 Approved by: https://github.com/Skylion007, https://github.com/albanD (cherry picked from commit 744f341) * Revert "Fix ref leak in `dtype.to_complex()`/`to_real()` (#125154)" This reverts commit a1b04d8. * Fix ref leak in `dtype.to_complex()`/`to_real()` (#125154) By using `Py_NewRef` Also, wrap `THPDtype_to_real`/`THPDtype_to_complex` calls with `HANDLE_TH_ERRORS` Add regression test for the above issues, by calling to_complex for integral dtypes, that raises an exception and by preserving reference count to the same to_complex/to_real call to detect if leak is happeneing. Replace ```cpp auto dtype = (PyObject*)torch::getTHPDtype(current_dtype); Py_INCREF(dtype); return dtype; ``` with a more compact/streamlined equivalent ```cpp return Py_NewRef(torch::getTHPDtype(current_dtype)); ``` Fixes #124868 Pull Request resolved: #125154 Approved by: https://github.com/Skylion007, https://github.com/albanD (cherry picked from commit 744f341) * Revert "Fix ref leak in `dtype.to_complex()`/`to_real()` (#125154)" This reverts commit 5a28bad. * Refactor autocast C++ APIs to be device-agnostic (#124359) # Motivation This PR aims to refactor autocast **C++** APIs to be device-agnostic and deprecate the device-specific autocast **C++** APIs. In C++ side, - `is_enabled()` -> `is_enabled(device_type)`. - `set_enabled(new_enabled)` -> `set_enabled(device_type, new_enabled)`. - `get_autocast_dtype()` -> `get_autocast_dtype(device_type)` - `set_autocast_dtype(dtype)` -> `set_autocast_dtype(device_type, dtype)` These following C++ APIs are deprecated and should be removed in PyTorch 2.5 - `is_cpu_enabled` - `set_cpu_enabled` - `get_autocast_cpu_dtype` - `set_autocast_cpu_dtype` - `is_xpu_enabled` - `set_xpu_enabled` - `get_autocast_xpu_dtype` - `set_autocast_xpu_dtype` - `is_ipu_enabled` - `set_ipu_enabled` - `get_autocast_ipu_dtype` - `set_autocast_ipu_dtype` - `is_hpu_enabled` - `set_hpu_enabled` - `get_autocast_hpu_dtype` - `set_autocast_hpu_dtype` - `is_xla_enabled` - `set_xla_enabled` - `get_autocast_xla_dtype` - `set_autocast_xla_dtype` - `is_privateuseone_enabled` - `set_privateuseone_enabled` - `get_autocast_privateuseone_dtype` - `set_autocast_privateuseone_dtype` In Python side, provide 4 generic autocast APIs: - `torch.is_autocast_enabled(device_type)` - `torch.set_autocast_enabled(device_type, new_enabled)` - `torch.get_autocast_dtype(device_type)` - `torch.set_autocast_dtype(device_type, dtype)` # Additional Context We will submit another PR to refactor autocast **Python** APIs based on this PR. Pull Request resolved: #124359 Approved by: https://github.com/jgong5, https://github.com/albanD * refactor autocast python APIs (#124479) Refactor autocast usage scenario in `torch/amp/autocast_mode.py` and `torch/utils/checkpoint.py` to fix the bug - convention conflict between `torch.xxx.get_autocast_xxx_dtype` defined in `autocast_mode.py` and `torch.xxx.get_autocast_dtype` defined in `checkpoint.py`. Use device-agnostic APIs like `torch.get_autocast_dtype`, ..., instead. Pull Request resolved: #124479 Approved by: https://github.com/jgong5, https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/albanD ghstack dependencies: #124359 * Fix ref leak in `dtype.to_complex()`/`to_real()` (#125154) By using `Py_NewRef` Also, wrap `THPDtype_to_real`/`THPDtype_to_complex` calls with `HANDLE_TH_ERRORS` Add regression test for the above issues, by calling to_complex for integral dtypes, that raises an exception and by preserving reference count to the same to_complex/to_real call to detect if leak is happeneing. Replace ```cpp auto dtype = (PyObject*)torch::getTHPDtype(current_dtype); Py_INCREF(dtype); return dtype; ``` with a more compact/streamlined equivalent ```cpp return Py_NewRef(torch::getTHPDtype(current_dtype)); ``` Fixes #124868 Pull Request resolved: #125154 Approved by: https://github.com/Skylion007, https://github.com/albanD * Revert "refactor autocast python APIs (#124479)" This reverts commit 495b0c9. * Revert "Refactor autocast C++ APIs to be device-agnostic (#124359)" This reverts commit 83106b7. --------- Co-authored-by: Nikita Shulga <[email protected]> Co-authored-by: Huy Do <[email protected]> Co-authored-by: Yu, Guangye <[email protected]>
Stack from ghstack (oldest at bottom):
Motivation
This PR aims to refactor autocast C++ APIs to be device-agnostic and deprecate the device-specific autocast C++ APIs.
In C++ side,
is_enabled()
->is_enabled(device_type)
.set_enabled(new_enabled)
->set_enabled(device_type, new_enabled)
.get_autocast_dtype()
->get_autocast_dtype(device_type)
set_autocast_dtype(dtype)
->set_autocast_dtype(device_type, dtype)
These following C++ APIs are deprecated and should be removed in PyTorch 2.5
is_cpu_enabled
set_cpu_enabled
get_autocast_cpu_dtype
set_autocast_cpu_dtype
is_xpu_enabled
set_xpu_enabled
get_autocast_xpu_dtype
set_autocast_xpu_dtype
is_ipu_enabled
set_ipu_enabled
get_autocast_ipu_dtype
set_autocast_ipu_dtype
is_hpu_enabled
set_hpu_enabled
get_autocast_hpu_dtype
set_autocast_hpu_dtype
is_xla_enabled
set_xla_enabled
get_autocast_xla_dtype
set_autocast_xla_dtype
is_privateuseone_enabled
set_privateuseone_enabled
get_autocast_privateuseone_dtype
set_autocast_privateuseone_dtype
In Python side,
provide 4 generic autocast APIs:
torch.is_autocast_enabled(device_type)
torch.set_autocast_enabled(device_type, new_enabled)
torch.get_autocast_dtype(device_type)
torch.set_autocast_dtype(device_type, dtype)
Additional Context
We will submit another PR to refactor autocast Python APIs based on this PR.
cc @mcarilli @ptrblck @leslie-fang-intel @jgong5 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang