Fix ref leak in `dtype.to_complex()`/`to_real()` #125154

malfet · 2024-04-29T13:55:31Z

By using Py_NewRef

Also, wrap THPDtype_to_real/THPDtype_to_complex calls with HANDLE_TH_ERRORS

Add regression test for the above issues, by calling to_complex for integral dtypes, that raises an exception and by preserving reference count to the same to_complex/to_real call to detect if leak is happeneing.

Replace

auto dtype = (PyObject*)torch::getTHPDtype(current_dtype);
Py_INCREF(dtype);
return dtype;

with a more compact/streamlined equivalent

return Py_NewRef(torch::getTHPDtype(current_dtype));

Fixes #124868

By using `Py_NewRef` Also, wrap `THPDtype_to_real`/`THPDtype_to_complex` calls with `HANDLE_TH_ERRORS` Add regression test for the above issues, by calling to_complex for integral dtypes, that raises an exception and by preserving reference count to the same to_complex/to_real call to delect if leak is happeneing

pytorch-bot · 2024-04-29T13:55:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125154

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (5 Unrelated Failures)

As of commit 54c9f87 with merge base 1a0b247 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

linux-aarch64 / linux-jammy-aarch64-py3.10 / build (gh)
Process completed with exit code 126.
pull / linux-focal-py3.11-clang10 / test (default, 1, 3, linux.2xlarge) (gh)
test_cpp_extensions_open_device_registration.py::TestCppExtensionOpenRgistration::test_open_device_registration
pull / linux-focal-py3.12-clang10 / test (default, 1, 3, linux.2xlarge) (gh)
test_cpp_extensions_open_device_registration.py::TestCppExtensionOpenRgistration::test_open_device_registration
pull / linux-focal-py3.8-clang10 / test (default, 1, 3, linux.2xlarge) (gh)
test_cpp_extensions_open_device_registration.py::TestCppExtensionOpenRgistration::test_open_device_registration
trunk / win-vs2019-cpu-py3 / test (default, 1, 3, windows.4xlarge.nonephemeral) (gh)
test_cpp_extensions_open_device_registration.py::TestCppExtensionOpenRgistration::test_open_device_registration

This comment was automatically generated by Dr. CI and updates every 15 minutes.

albanD

Approving as this is a good fix but not a full fix:

Note that

pytorch/torch/csrc/tensor/python_tensor.cpp

Lines 238 to 251 in 8885638

    
           static void set_type( 
        
               PyTensorType& type_obj, 
        
               Backend backend, 
        
               ScalarType scalarType) { 
        
             // This field is lazily initialized from backend and scalar_type 
        
             type_obj.backend = static_cast<int>(backend); 
        
             type_obj.scalar_type = static_cast<int>(scalarType); 
        
             type_obj.layout = torch::getTHPLayout(layout_from_backend(backend)); 
        
             type_obj.dtype = torch::getTHPDtype(scalarType); 
        
             type_obj.is_cuda = 
        
                 (backend == at::Backend::CUDA || backend == at::Backend::SparseCUDA); 
        
             type_obj.is_xpu = 
        
                 (backend == at::Backend::XPU || backend == at::Backend::SparseXPU); 
        
           }

from the issue is not updated here.

Also we shouldn't close the issue until the layout which has the same issue is fixed as well.

albanD · 2024-04-29T16:38:49Z

test/test_type_info.py

+        # Regression test for https://github.com/pytorch/pytorch/issues/124868
+        # If reference count is leaked this would be a set of 10 elements
+        ref_cnt = {sys.getrefcount(torch.float32.to_complex()) for _ in range(10)}
+        self.assertLess(len(ref_cnt), 3)


Why isn't this equal to 1 ?

Because we can run multiple tests in parallel that theoretically can affect the refcount for the type
But if assumed that testsuite is executed sequentially then yes, it should be equal to one

We run the test suite in parallel in multiple threads?? I don't expect that would work well given how heavily we use global states

We've been weeding out global state from test for quite a while, and we have a list of tests that should not run in parallel, see

pytorch/test/run_test.py

Lines 193 to 200 in 3d1dd79

RUN_PARALLEL_BLOCKLIST = [

"test_cpp_extensions_jit",

"test_cpp_extensions_open_device_registration",

"test_cpp_extensions_stream_and_event",

"test_cpp_extensions_mtia_backend",

"test_jit_disabled",

"test_mobile_optimizer",

"test_multiprocessing",

malfet · 2024-04-29T16:45:02Z

Note that

pytorch/torch/csrc/tensor/python_tensor.cpp

Lines 238 to 251 in 8885638

static void set_type(

PyTensorType& type_obj,

Backend backend,

ScalarType scalarType) {

// This field is lazily initialized from backend and scalar_type

type_obj.backend = static_cast<int>(backend);

type_obj.scalar_type = static_cast<int>(scalarType);

type_obj.layout = torch::getTHPLayout(layout_from_backend(backend));

type_obj.dtype = torch::getTHPDtype(scalarType);

type_obj.is_cuda =

(backend == at::Backend::CUDA || backend == at::Backend::SparseCUDA);

type_obj.is_xpu =

(backend == at::Backend::XPU || backend == at::Backend::SparseXPU);

}

from the issue is not updated here.

See my comment on the issue, I believe this code is fine, as structure can hold a borrowed pointer, and whenever python runtime copies this somewhere it must increase reference.

malfet · 2024-04-29T20:27:49Z

@pytorchbot merge

pytorchmergebot · 2024-04-29T20:29:44Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

By using `Py_NewRef` Also, wrap `THPDtype_to_real`/`THPDtype_to_complex` calls with `HANDLE_TH_ERRORS` Add regression test for the above issues, by calling to_complex for integral dtypes, that raises an exception and by preserving reference count to the same to_complex/to_real call to detect if leak is happeneing. Replace ```cpp auto dtype = (PyObject*)torch::getTHPDtype(current_dtype); Py_INCREF(dtype); return dtype; ``` with a more compact/streamlined equivalent ```cpp return Py_NewRef(torch::getTHPDtype(current_dtype)); ``` Fixes #124868 Pull Request resolved: #125154 Approved by: https://github.com/Skylion007, https://github.com/albanD

atalman · 2024-05-13T20:24:37Z

@pytorchbot cherry-pick --onto release/2.3 -c critical

By using `Py_NewRef` Also, wrap `THPDtype_to_real`/`THPDtype_to_complex` calls with `HANDLE_TH_ERRORS` Add regression test for the above issues, by calling to_complex for integral dtypes, that raises an exception and by preserving reference count to the same to_complex/to_real call to detect if leak is happeneing. Replace ```cpp auto dtype = (PyObject*)torch::getTHPDtype(current_dtype); Py_INCREF(dtype); return dtype; ``` with a more compact/streamlined equivalent ```cpp return Py_NewRef(torch::getTHPDtype(current_dtype)); ``` Fixes #124868 Pull Request resolved: #125154 Approved by: https://github.com/Skylion007, https://github.com/albanD (cherry picked from commit 744f341)

pytorchbot · 2024-05-13T20:28:21Z

Cherry picking #125154

The cherry pick PR is at #126101 and it is recommended to link a critical cherry pick PR with an issue

Details for Dev Infra team

Raised by workflow job

)" This reverts commit a1b04d8.

By using `Py_NewRef` Also, wrap `THPDtype_to_real`/`THPDtype_to_complex` calls with `HANDLE_TH_ERRORS` Add regression test for the above issues, by calling to_complex for integral dtypes, that raises an exception and by preserving reference count to the same to_complex/to_real call to detect if leak is happeneing. Replace ```cpp auto dtype = (PyObject*)torch::getTHPDtype(current_dtype); Py_INCREF(dtype); return dtype; ``` with a more compact/streamlined equivalent ```cpp return Py_NewRef(torch::getTHPDtype(current_dtype)); ``` Fixes pytorch#124868 Pull Request resolved: pytorch#125154 Approved by: https://github.com/Skylion007, https://github.com/albanD (cherry picked from commit 744f341)

This reverts commit 5a28bad.

By using `Py_NewRef` Also, wrap `THPDtype_to_real`/`THPDtype_to_complex` calls with `HANDLE_TH_ERRORS` Add regression test for the above issues, by calling to_complex for integral dtypes, that raises an exception and by preserving reference count to the same to_complex/to_real call to detect if leak is happeneing. Replace ```cpp auto dtype = (PyObject*)torch::getTHPDtype(current_dtype); Py_INCREF(dtype); return dtype; ``` with a more compact/streamlined equivalent ```cpp return Py_NewRef(torch::getTHPDtype(current_dtype)); ``` Fixes #124868 Pull Request resolved: #125154 Approved by: https://github.com/Skylion007, https://github.com/albanD

* Fix ref leak in `dtype.to_complex()`/`to_real()` (#125154) By using `Py_NewRef` Also, wrap `THPDtype_to_real`/`THPDtype_to_complex` calls with `HANDLE_TH_ERRORS` Add regression test for the above issues, by calling to_complex for integral dtypes, that raises an exception and by preserving reference count to the same to_complex/to_real call to detect if leak is happeneing. Replace ```cpp auto dtype = (PyObject*)torch::getTHPDtype(current_dtype); Py_INCREF(dtype); return dtype; ``` with a more compact/streamlined equivalent ```cpp return Py_NewRef(torch::getTHPDtype(current_dtype)); ``` Fixes #124868 Pull Request resolved: #125154 Approved by: https://github.com/Skylion007, https://github.com/albanD (cherry picked from commit 744f341) * Revert "Fix ref leak in `dtype.to_complex()`/`to_real()` (#125154)" This reverts commit a1b04d8. * Fix ref leak in `dtype.to_complex()`/`to_real()` (#125154) By using `Py_NewRef` Also, wrap `THPDtype_to_real`/`THPDtype_to_complex` calls with `HANDLE_TH_ERRORS` Add regression test for the above issues, by calling to_complex for integral dtypes, that raises an exception and by preserving reference count to the same to_complex/to_real call to detect if leak is happeneing. Replace ```cpp auto dtype = (PyObject*)torch::getTHPDtype(current_dtype); Py_INCREF(dtype); return dtype; ``` with a more compact/streamlined equivalent ```cpp return Py_NewRef(torch::getTHPDtype(current_dtype)); ``` Fixes #124868 Pull Request resolved: #125154 Approved by: https://github.com/Skylion007, https://github.com/albanD (cherry picked from commit 744f341) * Revert "Fix ref leak in `dtype.to_complex()`/`to_real()` (#125154)" This reverts commit 5a28bad. * Refactor autocast C++ APIs to be device-agnostic (#124359) # Motivation This PR aims to refactor autocast **C++** APIs to be device-agnostic and deprecate the device-specific autocast **C++** APIs. In C++ side, - `is_enabled()` -> `is_enabled(device_type)`. - `set_enabled(new_enabled)` -> `set_enabled(device_type, new_enabled)`. - `get_autocast_dtype()` -> `get_autocast_dtype(device_type)` - `set_autocast_dtype(dtype)` -> `set_autocast_dtype(device_type, dtype)` These following C++ APIs are deprecated and should be removed in PyTorch 2.5 - `is_cpu_enabled` - `set_cpu_enabled` - `get_autocast_cpu_dtype` - `set_autocast_cpu_dtype` - `is_xpu_enabled` - `set_xpu_enabled` - `get_autocast_xpu_dtype` - `set_autocast_xpu_dtype` - `is_ipu_enabled` - `set_ipu_enabled` - `get_autocast_ipu_dtype` - `set_autocast_ipu_dtype` - `is_hpu_enabled` - `set_hpu_enabled` - `get_autocast_hpu_dtype` - `set_autocast_hpu_dtype` - `is_xla_enabled` - `set_xla_enabled` - `get_autocast_xla_dtype` - `set_autocast_xla_dtype` - `is_privateuseone_enabled` - `set_privateuseone_enabled` - `get_autocast_privateuseone_dtype` - `set_autocast_privateuseone_dtype` In Python side, provide 4 generic autocast APIs: - `torch.is_autocast_enabled(device_type)` - `torch.set_autocast_enabled(device_type, new_enabled)` - `torch.get_autocast_dtype(device_type)` - `torch.set_autocast_dtype(device_type, dtype)` # Additional Context We will submit another PR to refactor autocast **Python** APIs based on this PR. Pull Request resolved: #124359 Approved by: https://github.com/jgong5, https://github.com/albanD * refactor autocast python APIs (#124479) Refactor autocast usage scenario in `torch/amp/autocast_mode.py` and `torch/utils/checkpoint.py` to fix the bug - convention conflict between `torch.xxx.get_autocast_xxx_dtype` defined in `autocast_mode.py` and `torch.xxx.get_autocast_dtype` defined in `checkpoint.py`. Use device-agnostic APIs like `torch.get_autocast_dtype`, ..., instead. Pull Request resolved: #124479 Approved by: https://github.com/jgong5, https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/albanD ghstack dependencies: #124359 * Fix ref leak in `dtype.to_complex()`/`to_real()` (#125154) By using `Py_NewRef` Also, wrap `THPDtype_to_real`/`THPDtype_to_complex` calls with `HANDLE_TH_ERRORS` Add regression test for the above issues, by calling to_complex for integral dtypes, that raises an exception and by preserving reference count to the same to_complex/to_real call to detect if leak is happeneing. Replace ```cpp auto dtype = (PyObject*)torch::getTHPDtype(current_dtype); Py_INCREF(dtype); return dtype; ``` with a more compact/streamlined equivalent ```cpp return Py_NewRef(torch::getTHPDtype(current_dtype)); ``` Fixes #124868 Pull Request resolved: #125154 Approved by: https://github.com/Skylion007, https://github.com/albanD * Revert "refactor autocast python APIs (#124479)" This reverts commit 495b0c9. * Revert "Refactor autocast C++ APIs to be device-agnostic (#124359)" This reverts commit 83106b7. --------- Co-authored-by: Nikita Shulga <[email protected]> Co-authored-by: Huy Do <[email protected]> Co-authored-by: Yu, Guangye <[email protected]>

malfet added release notes: python_frontend python frontend release notes category topic: bug fixes topic category labels Apr 29, 2024

Skylion007 approved these changes Apr 29, 2024

View reviewed changes

BetterEng

d85852d

malfet requested review from albanD and soulitzer as code owners April 29, 2024 16:12

Fix build on older runtimes

99db171

albanD approved these changes Apr 29, 2024

View reviewed changes

Fix lint

54c9f87

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 29, 2024

pytorchmergebot added the merging label Apr 29, 2024

pytorchmergebot added the Merged label Apr 29, 2024

pytorchmergebot closed this in 744f341 Apr 29, 2024

pytorchmergebot removed the merging label Apr 29, 2024

malfet mentioned this pull request Apr 30, 2024

tensor.dtype.to_complex() crashes kernel after ~100 calls in ipython kernel #124868

Closed

atalman mentioned this pull request May 13, 2024

[v2.3.1] Release Tracker #125425

Closed

huydhn added a commit to huydhn/pytorch that referenced this pull request May 14, 2024

Revert "Fix ref leak in dtype.to_complex()/to_real() (pytorch#125154

6c68f75

)" This reverts commit a1b04d8.

huydhn added a commit that referenced this pull request May 14, 2024

Revert "Fix ref leak in dtype.to_complex()/to_real() (#125154)"

1e37ec1

This reverts commit 5a28bad.

github-actions bot deleted the malfet/fix-python-ref-leak branch June 13, 2024 01:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix ref leak in `dtype.to_complex()`/`to_real()` #125154

Fix ref leak in `dtype.to_complex()`/`to_real()` #125154

Uh oh!

malfet commented Apr 29, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 29, 2024 •

edited

Loading

Uh oh!

albanD left a comment •

edited

Loading

Uh oh!

albanD Apr 29, 2024

Uh oh!

malfet Apr 29, 2024

Uh oh!

albanD Apr 29, 2024

Uh oh!

malfet Apr 29, 2024

Uh oh!

malfet commented Apr 29, 2024

Uh oh!

malfet commented Apr 29, 2024

Uh oh!

pytorchmergebot commented Apr 29, 2024

Uh oh!

atalman commented May 13, 2024

Uh oh!

pytorchbot commented May 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

	static void set_type(
	PyTensorType& type_obj,
	Backend backend,
	ScalarType scalarType) {
	// This field is lazily initialized from backend and scalar_type
	type_obj.backend = static_cast<int>(backend);
	type_obj.scalar_type = static_cast<int>(scalarType);
	type_obj.layout = torch::getTHPLayout(layout_from_backend(backend));
	type_obj.dtype = torch::getTHPDtype(scalarType);
	type_obj.is_cuda =
	(backend == at::Backend::CUDA \|\| backend == at::Backend::SparseCUDA);
	type_obj.is_xpu =
	(backend == at::Backend::XPU \|\| backend == at::Backend::SparseXPU);
	}

	RUN_PARALLEL_BLOCKLIST = [
	"test_cpp_extensions_jit",
	"test_cpp_extensions_open_device_registration",
	"test_cpp_extensions_stream_and_event",
	"test_cpp_extensions_mtia_backend",
	"test_jit_disabled",
	"test_mobile_optimizer",
	"test_multiprocessing",

Fix ref leak in dtype.to_complex()/to_real() #125154

Fix ref leak in dtype.to_complex()/to_real() #125154

Uh oh!

Conversation

malfet commented Apr 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125154

✅ You can merge normally! (5 Unrelated Failures)

Uh oh!

albanD left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albanD Apr 29, 2024

Choose a reason for hiding this comment

Uh oh!

malfet Apr 29, 2024

Choose a reason for hiding this comment

Uh oh!

albanD Apr 29, 2024

Choose a reason for hiding this comment

Uh oh!

malfet Apr 29, 2024

Choose a reason for hiding this comment

Uh oh!

malfet commented Apr 29, 2024

Uh oh!

malfet commented Apr 29, 2024

Uh oh!

pytorchmergebot commented Apr 29, 2024

Merge started

Uh oh!

atalman commented May 13, 2024

Uh oh!

pytorchbot commented May 13, 2024

Cherry picking #125154

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Fix ref leak in `dtype.to_complex()`/`to_real()` #125154

Fix ref leak in `dtype.to_complex()`/`to_real()` #125154

malfet commented Apr 29, 2024 •

edited

Loading

pytorch-bot bot commented Apr 29, 2024 •

edited

Loading

albanD left a comment •

edited

Loading