Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

guangyey
Copy link
Collaborator

@guangyey guangyey commented Apr 18, 2024

Stack from ghstack (oldest at bottom):

Motivation

This PR aims to refactor autocast C++ APIs to be device-agnostic and deprecate the device-specific autocast C++ APIs.
In C++ side,

  • is_enabled() -> is_enabled(device_type).
  • set_enabled(new_enabled) -> set_enabled(device_type, new_enabled).
  • get_autocast_dtype() -> get_autocast_dtype(device_type)
  • set_autocast_dtype(dtype) -> set_autocast_dtype(device_type, dtype)

These following C++ APIs are deprecated and should be removed in PyTorch 2.5

  • is_cpu_enabled
  • set_cpu_enabled
  • get_autocast_cpu_dtype
  • set_autocast_cpu_dtype
  • is_xpu_enabled
  • set_xpu_enabled
  • get_autocast_xpu_dtype
  • set_autocast_xpu_dtype
  • is_ipu_enabled
  • set_ipu_enabled
  • get_autocast_ipu_dtype
  • set_autocast_ipu_dtype
  • is_hpu_enabled
  • set_hpu_enabled
  • get_autocast_hpu_dtype
  • set_autocast_hpu_dtype
  • is_xla_enabled
  • set_xla_enabled
  • get_autocast_xla_dtype
  • set_autocast_xla_dtype
  • is_privateuseone_enabled
  • set_privateuseone_enabled
  • get_autocast_privateuseone_dtype
  • set_autocast_privateuseone_dtype

In Python side,
provide 4 generic autocast APIs:

  • torch.is_autocast_enabled(device_type)
  • torch.set_autocast_enabled(device_type, new_enabled)
  • torch.get_autocast_dtype(device_type)
  • torch.set_autocast_dtype(device_type, dtype)

Additional Context

We will submit another PR to refactor autocast Python APIs based on this PR.

cc @mcarilli @ptrblck @leslie-fang-intel @jgong5 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang

Copy link

pytorch-bot bot commented Apr 18, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/124359

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 927c0b4 with merge base 7e095be (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@guangyey guangyey marked this pull request as draft April 18, 2024 05:15
@guangyey guangyey changed the title refactor autocast APIs to be device-agnostic [WIP] Refactor autocast APIs to be device-agnostic Apr 18, 2024
guangyey added a commit that referenced this pull request Apr 18, 2024
ghstack-source-id: eeb80ac
Pull Request resolved: #124359
@pytorch-bot pytorch-bot bot added the release notes: jit release notes category label Apr 18, 2024
@guangyey
Copy link
Collaborator Author

guangyey commented Apr 18, 2024

@albanD This PR is working in progress. Is this PR reasonable as it is BC-breaking? If it is reasonable, I will continue finishing it. Could you give me your input? Thanks.

cc mcarilli ptrblck leslie-fang-intel jgong5

[ghstack-poisoned]
cc mcarilli ptrblck leslie-fang-intel jgong5

[ghstack-poisoned]
cc mcarilli ptrblck leslie-fang-intel jgong5

[ghstack-poisoned]
@albanD
Copy link
Collaborator

albanD commented Apr 18, 2024

I definitely love the direction!
We do want to be a bit careful with public API changes though:

  • In c++ if a github search doesn't return any hit, then I'm happy with making the BC-breaking change in this PR.
  • In python, we should deprecate all APIs for at least one version (if mostly un-used) or two version (if we see significant use via github search). The full process would be:
    • This PR can add the new API
    • This PR can add a warning about deprecation for the old API
    • For APIs only needing one version, just after the 2.4 branch cut, we can do a follow up deleting the deprecated APIs
    • For APIs needing two versions, wait after the 2.5 branch cut for the follow up.

cc mcarilli ptrblck leslie-fang-intel jgong5

[ghstack-poisoned]
# Motivation
This PR aims to refactor autocast APIs to be device-agnostic and remove the device-specific autocast APIs.
In C++ side,
`is_enabled()` -> `is_enabled(device_type)`.
`set_enabled(new_enabled)` -> `set_enabled(device_type, new_enabled)`.
`get_autocast_dtype()` -> `get_autocast_dtype(device_type)`
`set_autocast_dtype(dtype)` -> `set_autocast_dtype(device_type, dtype)`
~~`is_cpu_enabled`~~
~~`set_cpu_enabled`~~
~~`get_autocast_cpu_dtype`~~
~~`set_autocast_cpu_dtype`~~
~~`is_xpu_enabled`~~
~~`set_xpu_enabled`~~
~~`get_autocast_xpu_dtype`~~
~~`set_autocast_xpu_dtype`~~
~~`is_ipu_enabled`~~
~~`set_ipu_enabled`~~
~~`get_autocast_ipu_dtype`~~
~~`set_autocast_ipu_dtype`~~
~~`is_hpu_enabled`~~
~~`set_hpu_enabled`~~
~~`get_autocast_hpu_dtype`~~
~~`set_autocast_hpu_dtype`~~
~~`is_xla_enabled`~~
~~`set_xla_enabled`~~
~~`get_autocast_xla_dtype`~~
~~`set_autocast_xla_dtype`~~
~~`is_privateuseone_enabled`~~
~~`set_privateuseone_enabled`~~
~~`get_autocast_privateuseone_dtype`~~
~~`set_autocast_privateuseone_dtype`~~

In Python side,
provide 4 generic autocast APIs:
-`torch.is_autocast_enabled(device_type)`
-`torch.set_autocast_enabled(device_type, new_enabled)`
-`torch.get_autocast_dtype(device_type)`
-`torch.set_autocast_dtype(device_type, dtype)`


cc mcarilli ptrblck leslie-fang-intel jgong5

[ghstack-poisoned]
@guangyey
Copy link
Collaborator Author

I definitely love the direction! We do want to be a bit careful with public API changes though:

  • In c++ if a github search doesn't return any hit, then I'm happy with making the BC-breaking change in this PR.

  • In python, we should deprecate all APIs for at least one version (if mostly un-used) or two version (if we see significant use via github search). The full process would be:

    • This PR can add the new API
    • This PR can add a warning about deprecation for the old API
    • For APIs only needing one version, just after the 2.4 branch cut, we can do a follow up deleting the deprecated APIs
    • For APIs needing two versions, wait after the 2.5 branch cut for the follow up.

Thanks for your elaboration. I capture your point.

torch.is_autocast_cpu_enabled(),
torch.get_autocast_gpu_dtype(),
torch.get_autocast_cpu_dtype(),
torch.is_autocast_enabled("cuda"),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix UT warnings check issue on xla/test/test_dynamo.py::test_all_cpu_tensor

@guangyey guangyey added ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/rocm Trigger "default" config CI on ROCm and removed ciflow/rocm Trigger "default" config CI on ROCm labels Apr 23, 2024
@guangyey
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

# Motivation
This PR aims to refactor autocast **C++** APIs to be device-agnostic and deprecate the device-specific autocast  **C++** APIs.
In C++ side,
- `is_enabled()` -> `is_enabled(device_type)`.
- `set_enabled(new_enabled)` -> `set_enabled(device_type, new_enabled)`.
- `get_autocast_dtype()` -> `get_autocast_dtype(device_type)`
- `set_autocast_dtype(dtype)` -> `set_autocast_dtype(device_type, dtype)`

These following C++ APIs are deprecated and should be removed in PyTorch 2.5
- `is_cpu_enabled`
- `set_cpu_enabled`
- `get_autocast_cpu_dtype`
- `set_autocast_cpu_dtype`
- `is_xpu_enabled`
- `set_xpu_enabled`
- `get_autocast_xpu_dtype`
- `set_autocast_xpu_dtype`
- `is_ipu_enabled`
- `set_ipu_enabled`
- `get_autocast_ipu_dtype`
- `set_autocast_ipu_dtype`
- `is_hpu_enabled`
- `set_hpu_enabled`
- `get_autocast_hpu_dtype`
- `set_autocast_hpu_dtype`
- `is_xla_enabled`
- `set_xla_enabled`
- `get_autocast_xla_dtype`
- `set_autocast_xla_dtype`
- `is_privateuseone_enabled`
- `set_privateuseone_enabled`
- `get_autocast_privateuseone_dtype`
- `set_autocast_privateuseone_dtype`

In Python side,
provide 4 generic autocast APIs:
- `torch.is_autocast_enabled(device_type)`
- `torch.set_autocast_enabled(device_type, new_enabled)`
- `torch.get_autocast_dtype(device_type)`
- `torch.set_autocast_dtype(device_type, dtype)`

# Additional Context
We will submit another PR to refactor autocast **Python** APIs based on this PR.

cc mcarilli ptrblck leslie-fang-intel jgong5 voznesenskym penguinwu EikanWang Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang

[ghstack-poisoned]
@guangyey guangyey added ciflow/rocm Trigger "default" config CI on ROCm and removed ciflow/rocm Trigger "default" config CI on ROCm labels Apr 23, 2024
pytorchmergebot pushed a commit that referenced this pull request Apr 25, 2024
# Motivation
Refactor autocast usage scenario in `torch/amp/autocast_mode.py` and `torch/utils/checkpoint.py` to fix the bug - convention conflict between `torch.xxx.get_autocast_xxx_dtype` defined in `autocast_mode.py` and `torch.xxx.get_autocast_dtype` defined in `checkpoint.py`.

# Solution
Use device-agnostic APIs like `torch.get_autocast_dtype`, ..., instead.

Pull Request resolved: #124479
Approved by: https://github.com/jgong5, https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/albanD
ghstack dependencies: #124359
pytorch-bot bot pushed a commit that referenced this pull request May 3, 2024
# Motivation
This PR aims to refactor autocast **C++** APIs to be device-agnostic and deprecate the device-specific autocast  **C++** APIs.
In C++ side,
- `is_enabled()` -> `is_enabled(device_type)`.
- `set_enabled(new_enabled)` -> `set_enabled(device_type, new_enabled)`.
- `get_autocast_dtype()` -> `get_autocast_dtype(device_type)`
- `set_autocast_dtype(dtype)` -> `set_autocast_dtype(device_type, dtype)`

These following C++ APIs are deprecated and should be removed in PyTorch 2.5
- `is_cpu_enabled`
- `set_cpu_enabled`
- `get_autocast_cpu_dtype`
- `set_autocast_cpu_dtype`
- `is_xpu_enabled`
- `set_xpu_enabled`
- `get_autocast_xpu_dtype`
- `set_autocast_xpu_dtype`
- `is_ipu_enabled`
- `set_ipu_enabled`
- `get_autocast_ipu_dtype`
- `set_autocast_ipu_dtype`
- `is_hpu_enabled`
- `set_hpu_enabled`
- `get_autocast_hpu_dtype`
- `set_autocast_hpu_dtype`
- `is_xla_enabled`
- `set_xla_enabled`
- `get_autocast_xla_dtype`
- `set_autocast_xla_dtype`
- `is_privateuseone_enabled`
- `set_privateuseone_enabled`
- `get_autocast_privateuseone_dtype`
- `set_autocast_privateuseone_dtype`

In Python side,
provide 4 generic autocast APIs:
- `torch.is_autocast_enabled(device_type)`
- `torch.set_autocast_enabled(device_type, new_enabled)`
- `torch.get_autocast_dtype(device_type)`
- `torch.set_autocast_dtype(device_type, dtype)`

# Additional Context
We will submit another PR to refactor autocast **Python** APIs based on this PR.

Pull Request resolved: #124359
Approved by: https://github.com/jgong5, https://github.com/albanD
pytorch-bot bot pushed a commit that referenced this pull request May 3, 2024
# Motivation
Refactor autocast usage scenario in `torch/amp/autocast_mode.py` and `torch/utils/checkpoint.py` to fix the bug - convention conflict between `torch.xxx.get_autocast_xxx_dtype` defined in `autocast_mode.py` and `torch.xxx.get_autocast_dtype` defined in `checkpoint.py`.

# Solution
Use device-agnostic APIs like `torch.get_autocast_dtype`, ..., instead.

Pull Request resolved: #124479
Approved by: https://github.com/jgong5, https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/albanD
ghstack dependencies: #124359
huydhn pushed a commit that referenced this pull request May 14, 2024
# Motivation
This PR aims to refactor autocast **C++** APIs to be device-agnostic and deprecate the device-specific autocast  **C++** APIs.
In C++ side,
- `is_enabled()` -> `is_enabled(device_type)`.
- `set_enabled(new_enabled)` -> `set_enabled(device_type, new_enabled)`.
- `get_autocast_dtype()` -> `get_autocast_dtype(device_type)`
- `set_autocast_dtype(dtype)` -> `set_autocast_dtype(device_type, dtype)`

These following C++ APIs are deprecated and should be removed in PyTorch 2.5
- `is_cpu_enabled`
- `set_cpu_enabled`
- `get_autocast_cpu_dtype`
- `set_autocast_cpu_dtype`
- `is_xpu_enabled`
- `set_xpu_enabled`
- `get_autocast_xpu_dtype`
- `set_autocast_xpu_dtype`
- `is_ipu_enabled`
- `set_ipu_enabled`
- `get_autocast_ipu_dtype`
- `set_autocast_ipu_dtype`
- `is_hpu_enabled`
- `set_hpu_enabled`
- `get_autocast_hpu_dtype`
- `set_autocast_hpu_dtype`
- `is_xla_enabled`
- `set_xla_enabled`
- `get_autocast_xla_dtype`
- `set_autocast_xla_dtype`
- `is_privateuseone_enabled`
- `set_privateuseone_enabled`
- `get_autocast_privateuseone_dtype`
- `set_autocast_privateuseone_dtype`

In Python side,
provide 4 generic autocast APIs:
- `torch.is_autocast_enabled(device_type)`
- `torch.set_autocast_enabled(device_type, new_enabled)`
- `torch.get_autocast_dtype(device_type)`
- `torch.set_autocast_dtype(device_type, dtype)`

# Additional Context
We will submit another PR to refactor autocast **Python** APIs based on this PR.

Pull Request resolved: #124359
Approved by: https://github.com/jgong5, https://github.com/albanD
huydhn pushed a commit that referenced this pull request May 14, 2024
Refactor autocast usage scenario in `torch/amp/autocast_mode.py` and `torch/utils/checkpoint.py` to fix the bug - convention conflict between `torch.xxx.get_autocast_xxx_dtype` defined in `autocast_mode.py` and `torch.xxx.get_autocast_dtype` defined in `checkpoint.py`.

Use device-agnostic APIs like `torch.get_autocast_dtype`, ..., instead.

Pull Request resolved: #124479
Approved by: https://github.com/jgong5, https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/albanD
ghstack dependencies: #124359
huydhn added a commit that referenced this pull request May 14, 2024
atalman pushed a commit that referenced this pull request May 14, 2024
* Fix ref leak in `dtype.to_complex()`/`to_real()` (#125154)

By using `Py_NewRef`

Also, wrap `THPDtype_to_real`/`THPDtype_to_complex` calls with `HANDLE_TH_ERRORS`

Add regression test for the above issues, by calling to_complex for integral dtypes, that raises an exception and by preserving reference count to the same to_complex/to_real call to detect if leak is happeneing.

Replace
```cpp
auto dtype = (PyObject*)torch::getTHPDtype(current_dtype);
Py_INCREF(dtype);
return dtype;
```
with a more compact/streamlined equivalent
```cpp
return Py_NewRef(torch::getTHPDtype(current_dtype));
```

Fixes #124868

Pull Request resolved: #125154
Approved by: https://github.com/Skylion007, https://github.com/albanD

(cherry picked from commit 744f341)

* Revert "Fix ref leak in `dtype.to_complex()`/`to_real()` (#125154)"

This reverts commit a1b04d8.

* Fix ref leak in `dtype.to_complex()`/`to_real()` (#125154)

By using `Py_NewRef`

Also, wrap `THPDtype_to_real`/`THPDtype_to_complex` calls with `HANDLE_TH_ERRORS`

Add regression test for the above issues, by calling to_complex for integral dtypes, that raises an exception and by preserving reference count to the same to_complex/to_real call to detect if leak is happeneing.

Replace
```cpp
auto dtype = (PyObject*)torch::getTHPDtype(current_dtype);
Py_INCREF(dtype);
return dtype;
```
with a more compact/streamlined equivalent
```cpp
return Py_NewRef(torch::getTHPDtype(current_dtype));
```

Fixes #124868

Pull Request resolved: #125154
Approved by: https://github.com/Skylion007, https://github.com/albanD

(cherry picked from commit 744f341)

* Revert "Fix ref leak in `dtype.to_complex()`/`to_real()` (#125154)"

This reverts commit 5a28bad.

* Refactor autocast C++ APIs to be device-agnostic (#124359)

# Motivation
This PR aims to refactor autocast **C++** APIs to be device-agnostic and deprecate the device-specific autocast  **C++** APIs.
In C++ side,
- `is_enabled()` -> `is_enabled(device_type)`.
- `set_enabled(new_enabled)` -> `set_enabled(device_type, new_enabled)`.
- `get_autocast_dtype()` -> `get_autocast_dtype(device_type)`
- `set_autocast_dtype(dtype)` -> `set_autocast_dtype(device_type, dtype)`

These following C++ APIs are deprecated and should be removed in PyTorch 2.5
- `is_cpu_enabled`
- `set_cpu_enabled`
- `get_autocast_cpu_dtype`
- `set_autocast_cpu_dtype`
- `is_xpu_enabled`
- `set_xpu_enabled`
- `get_autocast_xpu_dtype`
- `set_autocast_xpu_dtype`
- `is_ipu_enabled`
- `set_ipu_enabled`
- `get_autocast_ipu_dtype`
- `set_autocast_ipu_dtype`
- `is_hpu_enabled`
- `set_hpu_enabled`
- `get_autocast_hpu_dtype`
- `set_autocast_hpu_dtype`
- `is_xla_enabled`
- `set_xla_enabled`
- `get_autocast_xla_dtype`
- `set_autocast_xla_dtype`
- `is_privateuseone_enabled`
- `set_privateuseone_enabled`
- `get_autocast_privateuseone_dtype`
- `set_autocast_privateuseone_dtype`

In Python side,
provide 4 generic autocast APIs:
- `torch.is_autocast_enabled(device_type)`
- `torch.set_autocast_enabled(device_type, new_enabled)`
- `torch.get_autocast_dtype(device_type)`
- `torch.set_autocast_dtype(device_type, dtype)`

# Additional Context
We will submit another PR to refactor autocast **Python** APIs based on this PR.

Pull Request resolved: #124359
Approved by: https://github.com/jgong5, https://github.com/albanD

* refactor autocast python APIs (#124479)

Refactor autocast usage scenario in `torch/amp/autocast_mode.py` and `torch/utils/checkpoint.py` to fix the bug - convention conflict between `torch.xxx.get_autocast_xxx_dtype` defined in `autocast_mode.py` and `torch.xxx.get_autocast_dtype` defined in `checkpoint.py`.

Use device-agnostic APIs like `torch.get_autocast_dtype`, ..., instead.

Pull Request resolved: #124479
Approved by: https://github.com/jgong5, https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/albanD
ghstack dependencies: #124359

* Fix ref leak in `dtype.to_complex()`/`to_real()` (#125154)

By using `Py_NewRef`

Also, wrap `THPDtype_to_real`/`THPDtype_to_complex` calls with `HANDLE_TH_ERRORS`

Add regression test for the above issues, by calling to_complex for integral dtypes, that raises an exception and by preserving reference count to the same to_complex/to_real call to detect if leak is happeneing.

Replace
```cpp
auto dtype = (PyObject*)torch::getTHPDtype(current_dtype);
Py_INCREF(dtype);
return dtype;
```
with a more compact/streamlined equivalent
```cpp
return Py_NewRef(torch::getTHPDtype(current_dtype));
```

Fixes #124868

Pull Request resolved: #125154
Approved by: https://github.com/Skylion007, https://github.com/albanD

* Revert "refactor autocast python APIs (#124479)"

This reverts commit 495b0c9.

* Revert "Refactor autocast C++ APIs to be device-agnostic (#124359)"

This reverts commit 83106b7.

---------

Co-authored-by: Nikita Shulga <[email protected]>
Co-authored-by: Huy Do <[email protected]>
Co-authored-by: Yu, Guangye <[email protected]>
@github-actions github-actions bot deleted the gh/guangyey/23/head branch June 2, 2024 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/mps Run MPS tests (subset of trunk) ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request Merged module: amp (automated mixed precision) autocast module: dynamo open source release notes: jit release notes category topic: improvements topic category

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants