MPS: Conv1d fails with NotImplementedError for output_channels > 65536 #152278
Labels
module: convolution
Problems related to convolutions (THNN, THCUNN, CuDNN)
module: mps
Related to Apple Metal Performance Shaders framework
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone
π Describe the bug
Running torch.nn.functional.conv1d (or torch.nn.Conv1d) on the MPS backend results in the following error when the number of output channels exceeds 65536:
NotImplementedError: Output channels > 65536 not supported at the MPS device.
This limitation prevents certain common model architectures, such as standard Wav2Vec2 implementations which utilize Conv1d layers with high channel counts in their feature extraction components, from running natively on the MPS device.
The current workaround involves either using the global PYTORCH_ENABLE_MPS_FALLBACK=1 environment variable or implementing targeted code changes to move the specific conv1d operation and its inputs/outputs to the CPU, both of which negatively impact performance compared to native MPS execution.
Please consider adding support for conv1d operations with output channels > 65536 on the MPS backend to improve hardware acceleration coverage and performance for models relying on such layers.
Reproduce:
Environment:
PyTorch Version: 2.5.1
macOS Version: Sequoiq 15.4.1
Hardware: Apple Silicon (M-series chip)
Versions
Collecting environment information...
PyTorch version: 2.5.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 15.4.1 (arm64)
GCC version: Could not collect
Clang version: 16.0.0 (clang-1600.0.26.6)
CMake version: version 3.31.1
Libc version: N/A
Python version: 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:35:25) [Clang 16.0.6 ] (64-bit runtime)
Python platform: macOS-15.4.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Apple M4 Max
Versions of relevant libraries:
[pip3] mypy_extensions==1.1.0
[pip3] numpy==1.26.4
[pip3] onnx==1.17.0
[pip3] onnx-weekly==1.19.0.dev20250425
[pip3] onnx2torch==1.5.15
[pip3] onnx2torch-py313==1.6.0
[pip3] onnxruntime==1.21.1
[pip3] pytorch-wpe==0.0.1
[pip3] rotary-embedding-torch==0.6.5
[pip3] torch==2.5.1
[pip3] torch-complex==0.4.4
[pip3] torchaudio==2.5.1
[pip3] torchvision==0.20.1
[conda] libopenvino-pytorch-frontend 2025.0.0 h286801f_3 conda-forge
[conda] numpy 1.26.4 pypi_0 pypi
[conda] onnx2torch 1.5.15 pypi_0 pypi
[conda] onnx2torch-py313 1.6.0 pypi_0 pypi
[conda] pytorch-wpe 0.0.1 pypi_0 pypi
[conda] rotary-embedding-torch 0.6.5 pypi_0 pypi
[conda] torch 2.5.1 pypi_0 pypi
[conda] torch-complex 0.4.4 pypi_0 pypi
[conda] torchaudio 2.5.1 pypi_0 pypi
[conda] torchvision 0.20.1 pypi_0 pypi
cc @kulinseth @albanD @malfet @DenisVieriu97 @jhavukainen
The text was updated successfully, but these errors were encountered: