Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Install pytorch from pypi using local CUDA build #150742

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ikrommyd opened this issue Apr 6, 2025 · 5 comments
Open

Install pytorch from pypi using local CUDA build #150742

ikrommyd opened this issue Apr 6, 2025 · 5 comments
Assignees
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix has workaround module: binaries Anything related to official binaries that we release to users module: ci Related to continuous integration needs design oncall: releng In support of CI and Release Engineering triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone

Comments

@ikrommyd
Copy link

ikrommyd commented Apr 6, 2025

πŸš€ The feature, motivation and pitch

It's great that nvidia provides wheels for the CUDA related packages and we don't need conda/mamba to install pytorch anymore, but those packages take up space if you install pytorch in multiple environments.
I would be nice if you could install a pytorch version from pypi that could grab and use your local cuda build.

For example, cupy provides pip install cupy-cuda12x. jax provides pip install "jax[cuda12_local]" and as far as I'm aware, pip install tensorflow also appears to use the GPU even if I don't specify pip install "tensorflow[and-cuda]" which could install the nvidia/cuda wheels as well.

Please close if this is just not possible in pytorch's case or a duplicate (I didn't see it if it's there).

Alternatives

Just have the available space and install the nvidia wheels on every environment separately.

Additional context

No response

cc @seemethere @malfet @osalpekar @atalman @pytorch/pytorch-dev-infra

@malfet malfet added module: binaries Anything related to official binaries that we release to users enhancement Not as big of a feature, but technically not a bug. Should be easy to fix oncall: releng In support of CI and Release Engineering module: ci Related to continuous integration labels Apr 7, 2025
@malfet
Copy link
Contributor

malfet commented Apr 7, 2025

@ikrommyd thank you for your suggestion. Looks reasonable. By the way, have you tried using --no-deps and checking if PyTorch is usable afterwards (it shoudl pick local dependencies in that case)
@ZainRizvi are oncall: releng still active or should be be merged with module: ci?

@ZainRizvi
Copy link
Contributor

ZainRizvi commented Apr 7, 2025

@malfet oncall: releng and module: ci both get routed to the Dev Infra board. Using either one is fine.

@ikrommyd
Copy link
Author

ikrommyd commented Apr 8, 2025

Huh that's interesting, here I am in an environment without any nvidia packages installed and just torch from pypi. CUDA 12.6 system wide installation and my own build of nccl under /home/iason/software/torch/nccl/build/lib. It appears that it is working. I didn't to try that before πŸ˜ƒ I built nccl myself since there is no build for version 2.26.2 in the PPA for CUDA 12.6 in ubuntu 24.04.

(ak-uproot) ➜  ~ echo $LD_LIBRARY_PATH
/usr/local/cuda-12.6/lib64:/home/iason/software/torch/nccl/build/lib
(ak-uproot) ➜  ~ ipython
iPython 3.12.9 | packaged by conda-forge | (main, Feb 14 2025, 08:00:06) [GCC 13.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.32.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import torch

In [2]: torch.cuda.is_available()
Out[2]: True

In [3]: torch.cuda.is_bf16_supported()
Out[3]: True

In [4]: x = torch.rand(5, 3)

In [5]: print(x)
tensor([[0.3051, 0.5993, 0.2161],
        [0.1847, 0.2640, 0.4210],
        [0.0208, 0.4589, 0.3539],
        [0.5538, 0.0826, 0.1458],
        [0.3174, 0.1281, 0.6403]])

In [6]: import torch
   ...: import torch.nn as nn
   ...:
   ...: # Check if CUDA is available
   ...: device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
   ...: print(f"Using device: {device}")
   ...:
   ...: # Set parameters
   ...: batch_size = 8
   ...: seq_length = 10
   ...: embed_dim = 512
   ...: num_heads = 8
   ...:
   ...: # Create random input tensors
   ...: # Shape: [sequence length, batch size, embedding dimension]
   ...: query = torch.randn(seq_length, batch_size, embed_dim, device=device)
   ...: key = torch.randn(seq_length, batch_size, embed_dim, device=device)
   ...: value = torch.randn(seq_length, batch_size, embed_dim, device=device)
   ...:
   ...: # Create MultiheadAttention module
   ...: mha = nn.MultiheadAttention(embed_dim, num_heads).to(device)
   ...:
   ...: # Forward pass
   ...: output, attn_weights = mha(query, key, value)
   ...:
   ...: # Print shapes
   ...: print(f"Input shapes: query={query.shape}, key={key.shape}, value={value.shape}")
   ...: print(f"Output shape: {output.shape}")
   ...: print(f"Attention weights shape: {attn_weights.shape}")
Using device: cuda
Input shapes: query=torch.Size([10, 8, 512]), key=torch.Size([10, 8, 512]), value=torch.Size([10, 8, 512])
Output shape: torch.Size([10, 8, 512])
Attention weights shape: torch.Size([8, 10, 10])

In [7]: query.device
Out[7]: device(type='cuda', index=0)

In [8]: exit()
(ak-uproot) ➜  ~ pip list | grep torch
torch                     2.6.0+cu126
torchaudio                2.6.0+cu126
torchvision               0.21.0+cu126
(ak-uproot) ➜  ~ pip list | grep nvidia
(ak-uproot) ➜  ~

@ikrommyd
Copy link
Author

ikrommyd commented Apr 8, 2025

I'd like to say then that this should probably be tested more officially (in ci perhaps) and advertised more in the installation instructions. Perhaps the existence of pip install "torch["cuda12_local"] is a good option as well that installs everything else apart from the nvidia packages but does not install the cpu only build like https://download.pytorch.org/whl/cpu does.

I don't understand why nccl is a required dependency though. In my opinion, it should be optional only if you have more than one GPUs but you're the experts on this.

@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 11, 2025
@ZainRizvi ZainRizvi moved this to Cold Storage in PyTorch OSS Dev Infra Apr 15, 2025
@ZainRizvi ZainRizvi added this to the 2.8.0 milestone Apr 15, 2025
@ikrommyd
Copy link
Author

ikrommyd commented May 4, 2025

To add one more comment to this, it would be really nice if a support matrix or something similar is provided of what the user needs to install on their system (libraries from nvidia or others and supported versions) to install pytorch without the nvidia dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix has workaround module: binaries Anything related to official binaries that we release to users module: ci Related to continuous integration needs design oncall: releng In support of CI and Release Engineering triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Status: Cold Storage
Development

No branches or pull requests

5 participants