[Issue] Torch tests hanging and hitting 6 hour timeouts on Windows test runners

## Overview

PyTorch unit tests are hanging on multiple test runners, across multiple torch versions. This does not appear to be a recent regression.

## Symptoms / evidence / details

Workflow run: https://github.com/ROCm/TheRock/actions/runs/26707885453, using rocm version `7.14.0a20260531`

### gfx1151: `CS-RORDMZ-DT244` runner

https://github.com/ROCm/TheRock/actions/runs/26738630672/job/78812375372

### gfx110X-all: `azure-windows-11-gfx1101` runners

index https://rocm.nightlies.amd.com/v2-staging/gfx110X-all/

* Appears to affect only torch versions 2.11, 2.12, nightly.
* Not affecting...
  * torch version 2.9, tests segfaulted: https://github.com/ROCm/TheRock/actions/runs/26707885453/job/78783769908#step:13:3270

      ```
      external-builds\pytorch\pytorch\test\test_cuda.py::TestBlockStateAbsorption::test_tensor_dies_after_checkpoint SKIPPED [0.0001s] [  8%]
      external-builds\pytorch\pytorch\test\test_cuda.py::TestMemPool::test_graph_capture_reclaim_2_streams PASSED [0.0039s] [  8%]
      Windows fatal exception: access violation
      
      Thread 0x00001e94 (most recent call first):
        <no Python frame>
      
      Current thread 0x0000182c (most recent call first):
        File "B:\runner\_work\TheRock\TheRock\external-builds\pytorch\pytorch\test\test_cuda.py", line 5675 in test_graph_capture_reclaim_4_streams
      ```
        
  * torch version 2.10, tests completed: https://github.com/ROCm/TheRock/actions/runs/26707885453/job/78783770046#step:13:40191
      ```
      FAILED [0.0016s] external-builds\pytorch\pytorch\test\test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_launch_rocm_cuda - RuntimeError: input tensor has spatial dimension larger than the kernel capacity
        = 1 failed, 15541 passed, 24534 skipped, 301 deselected, 44 xfailed, 2 subtests passed in 783.13s (0:13:03) =
      ```
        
Observed on torch version e.g. `2.10.0+rocm7.14.0a20260531` 

Jobs and log snippets:
* https://github.com/ROCm/TheRock/actions/runs/26707885453/job/78783769930
    ```
    Mon, 01 Jun 2026 05:36:55 GMT external-builds\pytorch\pytorch\test\test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_label_smoothing_with_probs_cuda PASSED [0.0178s] [  2%]
    Mon, 01 Jun 2026 05:36:55 GMT external-builds\pytorch\pytorch\test\test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_large_tensor_reduction_mean_cuda SKIPPED [0.4036s] [  2%]
    Mon, 01 Jun 2026 05:36:56 GMT external-builds\pytorch\pytorch\test\test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_large_tensor_reduction_none_cuda SKIPPED [0.4747s] [  2%]
    Mon, 01 Jun 2026 05:36:56 GMT external-builds\pytorch\pytorch\test\test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_large_tensor_reduction_sum_cuda SKIPPED [0.3975s] [  2%]
    Mon, 01 Jun 2026 11:29:09 GMT Error: The operation was canceled.
    ```
* https://github.com/ROCm/TheRock/actions/runs/26707885453/job/78783770048
    ```
    Mon, 01 Jun 2026 13:01:38 GMT external-builds\pytorch\pytorch\test\test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_label_smoothing_with_probs_cuda PASSED [0.0180s] [  2%]
    Mon, 01 Jun 2026 13:01:38 GMT external-builds\pytorch\pytorch\test\test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_large_tensor_reduction_mean_cuda SKIPPED [0.3561s] [  2%]
    Mon, 01 Jun 2026 13:01:39 GMT external-builds\pytorch\pytorch\test\test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_large_tensor_reduction_none_cuda SKIPPED [0.3301s] [  2%]
    Mon, 01 Jun 2026 13:01:39 GMT external-builds\pytorch\pytorch\test\test_nn.py::TestNNDeviceTypeCUDA::test_cross_entropy_large_tensor_reduction_sum_cuda SKIPPED [0.3322s] [  2%]
    Mon, 01 Jun 2026 18:54:33 GMT Error: The operation was canceled.
    ```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue] Torch tests hanging and hitting 6 hour timeouts on Windows test runners #5565

Overview

Symptoms / evidence / details

gfx1151: `CS-RORDMZ-DT244` runner

gfx110X-all: `azure-windows-11-gfx1101` runners

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Issue] Torch tests hanging and hitting 6 hour timeouts on Windows test runners #5565

Description

Overview

Symptoms / evidence / details

gfx1151: CS-RORDMZ-DT244 runner

gfx110X-all: azure-windows-11-gfx1101 runners

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

gfx1151: `CS-RORDMZ-DT244` runner

gfx110X-all: `azure-windows-11-gfx1101` runners