Install pytorch from pypi using local CUDA build #150742

ikrommyd · 2025-04-06T14:55:01Z

🚀 The feature, motivation and pitch

It's great that nvidia provides wheels for the CUDA related packages and we don't need conda/mamba to install pytorch anymore, but those packages take up space if you install pytorch in multiple environments.
I would be nice if you could install a pytorch version from pypi that could grab and use your local cuda build.

For example, cupy provides pip install cupy-cuda12x. jax provides pip install "jax[cuda12_local]" and as far as I'm aware, pip install tensorflow also appears to use the GPU even if I don't specify pip install "tensorflow[and-cuda]" which could install the nvidia/cuda wheels as well.

Please close if this is just not possible in pytorch's case or a duplicate (I didn't see it if it's there).

Alternatives

Just have the available space and install the nvidia wheels on every environment separately.

Additional context

No response

cc @seemethere @malfet @osalpekar @atalman @pytorch/pytorch-dev-infra

The text was updated successfully, but these errors were encountered:

malfet · 2025-04-07T14:28:12Z

@ikrommyd thank you for your suggestion. Looks reasonable. By the way, have you tried using --no-deps and checking if PyTorch is usable afterwards (it shoudl pick local dependencies in that case)
@ZainRizvi are oncall: releng still active or should be be merged with module: ci?

ZainRizvi · 2025-04-07T16:03:18Z

@malfet oncall: releng and module: ci both get routed to the Dev Infra board. Using either one is fine.

ikrommyd · 2025-04-08T04:12:13Z

Huh that's interesting, here I am in an environment without any nvidia packages installed and just torch from pypi. CUDA 12.6 system wide installation and my own build of nccl under /home/iason/software/torch/nccl/build/lib. It appears that it is working. I didn't to try that before 😃 I built nccl myself since there is no build for version 2.26.2 in the PPA for CUDA 12.6 in ubuntu 24.04.

(ak-uproot) ➜  ~ echo $LD_LIBRARY_PATH
/usr/local/cuda-12.6/lib64:/home/iason/software/torch/nccl/build/lib
(ak-uproot) ➜  ~ ipython
iPython 3.12.9 | packaged by conda-forge | (main, Feb 14 2025, 08:00:06) [GCC 13.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.32.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import torch

In [2]: torch.cuda.is_available()
Out[2]: True

In [3]: torch.cuda.is_bf16_supported()
Out[3]: True

In [4]: x = torch.rand(5, 3)

In [5]: print(x)
tensor([[0.3051, 0.5993, 0.2161],
        [0.1847, 0.2640, 0.4210],
        [0.0208, 0.4589, 0.3539],
        [0.5538, 0.0826, 0.1458],
        [0.3174, 0.1281, 0.6403]])

In [6]: import torch
   ...: import torch.nn as nn
   ...:
   ...: # Check if CUDA is available
   ...: device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
   ...: print(f"Using device: {device}")
   ...:
   ...: # Set parameters
   ...: batch_size = 8
   ...: seq_length = 10
   ...: embed_dim = 512
   ...: num_heads = 8
   ...:
   ...: # Create random input tensors
   ...: # Shape: [sequence length, batch size, embedding dimension]
   ...: query = torch.randn(seq_length, batch_size, embed_dim, device=device)
   ...: key = torch.randn(seq_length, batch_size, embed_dim, device=device)
   ...: value = torch.randn(seq_length, batch_size, embed_dim, device=device)
   ...:
   ...: # Create MultiheadAttention module
   ...: mha = nn.MultiheadAttention(embed_dim, num_heads).to(device)
   ...:
   ...: # Forward pass
   ...: output, attn_weights = mha(query, key, value)
   ...:
   ...: # Print shapes
   ...: print(f"Input shapes: query={query.shape}, key={key.shape}, value={value.shape}")
   ...: print(f"Output shape: {output.shape}")
   ...: print(f"Attention weights shape: {attn_weights.shape}")
Using device: cuda
Input shapes: query=torch.Size([10, 8, 512]), key=torch.Size([10, 8, 512]), value=torch.Size([10, 8, 512])
Output shape: torch.Size([10, 8, 512])
Attention weights shape: torch.Size([8, 10, 10])

In [7]: query.device
Out[7]: device(type='cuda', index=0)

In [8]: exit()
(ak-uproot) ➜  ~ pip list | grep torch
torch                     2.6.0+cu126
torchaudio                2.6.0+cu126
torchvision               0.21.0+cu126
(ak-uproot) ➜  ~ pip list | grep nvidia
(ak-uproot) ➜  ~

ikrommyd · 2025-04-08T04:18:06Z

I'd like to say then that this should probably be tested more officially (in ci perhaps) and advertised more in the installation instructions. Perhaps the existence of pip install "torch["cuda12_local"] is a good option as well that installs everything else apart from the nvidia packages but does not install the cpu only build like https://download.pytorch.org/whl/cpu does.

I don't understand why nccl is a required dependency though. In my opinion, it should be optional only if you have more than one GPUs but you're the experts on this.

ikrommyd · 2025-05-04T11:52:29Z

To add one more comment to this, it would be really nice if a support matrix or something similar is provided of what the user needs to install on their system (libraries from nvidia or others and supported versions) to install pytorch without the nvidia dependencies.

malfet added module: binaries Anything related to official binaries that we release to users enhancement Not as big of a feature, but technically not a bug. Should be easy to fix oncall: releng In support of CI and Release Engineering module: ci Related to continuous integration labels Apr 7, 2025

github-project-automation bot added this to PyTorch OSS Dev Infra Apr 7, 2025

malfet added the has workaround label Apr 7, 2025

malfet mentioned this issue Apr 7, 2025

torch wheels are unusable if CUDA RPMs are installed on the system (was Import error in nvidia/cuda:12.6.3-cudnn-devel-rockylinux9) #150399

Open

soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 11, 2025

ZainRizvi moved this to Cold Storage in PyTorch OSS Dev Infra Apr 15, 2025

ZainRizvi added this to the 2.8.0 milestone Apr 15, 2025

ZainRizvi added the needs design label Apr 15, 2025

ZainRizvi assigned seemethere Apr 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Install pytorch from pypi using local CUDA build #150742

Install pytorch from pypi using local CUDA build #150742

ikrommyd commented Apr 6, 2025 •

edited by pytorch-bot bot

Loading

malfet commented Apr 7, 2025

ZainRizvi commented Apr 7, 2025 •

edited

Loading

ikrommyd commented Apr 8, 2025 •

edited

Loading

ikrommyd commented Apr 8, 2025

ikrommyd commented May 4, 2025

Install pytorch from pypi using local CUDA build #150742

Install pytorch from pypi using local CUDA build #150742

Comments

ikrommyd commented Apr 6, 2025 • edited by pytorch-bot bot Loading

🚀 The feature, motivation and pitch

Alternatives

Additional context

malfet commented Apr 7, 2025

ZainRizvi commented Apr 7, 2025 • edited Loading

ikrommyd commented Apr 8, 2025 • edited Loading

ikrommyd commented Apr 8, 2025

ikrommyd commented May 4, 2025

ikrommyd commented Apr 6, 2025 •

edited by pytorch-bot bot

Loading

ZainRizvi commented Apr 7, 2025 •

edited

Loading

ikrommyd commented Apr 8, 2025 •

edited

Loading