Context
- Workflow: Release portable Linux PyTorch Wheels
- Workflow file:
.github/workflows/release_portable_linux_pytorch_wheels.yml
- Failing run: ↗ View run
- Platform:
Linux
- Impacted Arch:
gfx950-dcgpu
- PyTorch Version:
nightly (2.13.0a0+rocm7.14.0a20260602)
- Python Version:
3.13, 3.11, 3.10, 3.14
Failed Tests (8)
test_nn.py::TestNNDeviceTypeCUDA::test_CTCLoss_cudnn_cuda
test_nn.py::TestNNDeviceTypeCUDA::test_LSTM_dropout_per_call_randomness_dropout_p_0_5_training_True_cuda
test_nn.py::TestNNDeviceTypeCUDA::test_ctc_loss_cudnn_tensor_cuda_cuda
test_nn.py::TestNNDeviceTypeCUDA::test_upsamplingNearest2d_launch_rocm_cuda
test_cuda.py::TestCuda::test_hip_device_count
test_cuda.py::TestCudaAllocator::test_memory_compile_regions
test_cuda.py::TestMemPool::test_mempool_empty_cache_inactive
test_cuda.py::TestMemPool::test_mempool_limited_memory_with_allocator
Root Cause
The skip-list loader in run_pytorch_tests.py detects torch version 2.13 and tries to load external-builds/pytorch/skip_tests/pytorch_2.13.py. That file does not exist, so only generic.py is loaded. All the tests above are already marked as known failures in the stable-version skip files (pytorch_2.9.py – pytorch_2.12.py), but those exclusions do not apply to nightly.
The most structurally notable error (two test failures):
/__w/TheRock/TheRock/.venv/lib/python3.13/site-packages/torch/include/ATen/hip/Exceptions.h:5:10: fatal error: 'hipblas/hipblas.h' file not found
5 | #include <hipblas/hipblas.h>
| ^~~~~~~~~~~~~~~~~~~
1 error generated.
ninja: build stopped: subcommand failed.
FAILED [7.2476s] test_cuda.py::TestMemPool::test_mempool_limited_memory_with_allocator
Suggested Fix
Add external-builds/pytorch/skip_tests/pytorch_2.13.py mirroring the known-failing entries from pytorch_2.12.py for the tests listed above. The test_upsamplingNearest2d_launch_rocm_cuda failure on gfx950 is also separately tracked in #5270.
Sample Failing Job
Context
.github/workflows/release_portable_linux_pytorch_wheels.ymlLinuxgfx950-dcgpunightly(2.13.0a0+rocm7.14.0a20260602)3.13, 3.11, 3.10, 3.14Failed Tests (8)
Root Cause
The skip-list loader in
run_pytorch_tests.pydetects torch version2.13and tries to loadexternal-builds/pytorch/skip_tests/pytorch_2.13.py. That file does not exist, so onlygeneric.pyis loaded. All the tests above are already marked as known failures in the stable-version skip files (pytorch_2.9.py–pytorch_2.12.py), but those exclusions do not apply to nightly.The most structurally notable error (two test failures):
Suggested Fix
Add
external-builds/pytorch/skip_tests/pytorch_2.13.pymirroring the known-failing entries frompytorch_2.12.pyfor the tests listed above. Thetest_upsamplingNearest2d_launch_rocm_cudafailure on gfx950 is also separately tracked in #5270.Sample Failing Job