-
Notifications
You must be signed in to change notification settings - Fork 73
Open
Description
Problem description
Run the SCUDA server on a GPU server, then run commands. RuntimeError was encountered.
Environmental information
CUDA_VERSION=12.6.2
DISTRO_VERSION=24.04
OS_DISTRO=ubuntu
CUDNN_TAG=cudnn
Reproduce steps
- Build an image using the example dockerfile and start the container
- Execute the command,
pip install numpy pandas torch - Use the command to start the server ./local.sh server
- Set environment var export SCUDA_SERVER=127.0.0.1
- Use the command to start the client LD_PRELOAD=./libscuda_12.6.so python3 test.py
My test.py file like this:
import torch
print(torch.cuda.is_available())
print(torch.cuda.get_device_name())Current behavior
The output like this:
......
dlsym: cuModuleGetGlobal_v2
dlsym: PyInit__C
dlsym: PyInit__multiarray_umath
dlsym: PyInit__contextvars
dlsym: PyInit__umath_linalg
dlsym: PyInit_mmap
dlsym: PyInit__ssl
dlsym: PyInit__asyncio
dlsym: PyInit__queue
dlsym: PyInit__hashlib
dlsym: PyInit__multiprocessing
dlsym: cuDevicePrimaryCtxGetState
dlsym: cuGetErrorString
True
Traceback (most recent all last):
File "<string>", line 1, in <module>
...
File ".../site-packages/torch/cuda/__init__.py", live 372, in _lazy_init
torch._C._cuda_init()
RuntimeError: CUDA driver error: initialization error
It seems that torch.cuda.is_available() works normally, but cuda can not be initialized in fact.
I tried to run the script bellow without scuda method, it gave me the correct result. However, the runtime error will be encountered if you use LD_PRELOAD=./libscuda_12.6.so to run it. I guess there was a problem using RPC to implement CUDA's C interface.
Metadata
Metadata
Assignees
Labels
No labels