Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@RJKeevil
Copy link
Contributor

@RJKeevil RJKeevil commented Dec 2, 2024

Use an additional path to search for nvidia GPUs

@janpfeifer
Copy link
Contributor

So this function is called if a cuda PJRT plugin was found, but it is not sure if there is an actual GPU card installed.

The use case is: a demo docker with all the PJRTs installed shouldn't attempt to run a cuda PJRT if it is running on a computer with no GPUs.

Looking at /usr/bin/nvidia will only detect that the nvidia programs are installed, not whether there is an actual GPU card installed.

Now looking at /dev/nvidia* seems not to be fail proof either ... Let's chat later maybe we could look at:

  • ls -ld /sys/module/nvidia*
  • ls -ld /sys/bus/pci/drivers/nvidia*

Is any of those not empty in your container set up ?

In the meantime, I'm adding documentation and logging to the function, including logging of the work around: providing the absolute path to the cuda pjrt. See #19

@RJKeevil
Copy link
Contributor Author

RJKeevil commented Dec 3, 2024

Both of these paths are empty, i think theres some wizardry with Docker Desktop and WSL2 where the container somehow just delegates to the cuda drivers on the host OS. nvidia-smi is added to path, perhaps issuing that command is a reasonably generic way to see if Cuda is present in a system?

@janpfeifer
Copy link
Contributor

I hesitate making the test depend on the installation of nvidia-smi. For instance, the demo docker doesn't contain it, even though it works with NVidia CUDA. Also, I'm not sure about distribution rights of these nvidia tools. The legalese is not clear to me ... but maybe it's an option. Let me search around for alternatives in Windows WSL.

@RJKeevil
Copy link
Contributor Author

RJKeevil commented Dec 3, 2024

Agreed, I dont think it should depend on it; the current check could still look for nvidia files but calling nvidia-smi could be a fallback for this case? I've looked further in the container, only other evidence I can find for the presence of cuda is the presence of /usr/lib/wsl/drivers/nv_dispi.inf_amd64_adf5a840df867035

@janpfeifer
Copy link
Contributor

@janpfeifer
Copy link
Contributor

Yes, checking if nvidia-smi is available and then executing it to check is a very viable option. Do you want to implement that ?

@janpfeifer janpfeifer merged commit c8c81d9 into gomlx:main Dec 3, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants