a transparent-level library overhook lib-cuda and lib-nvidia-ml
CFN-Cloud(In development...)
- build builder image
bash ./hack/build-builder.sh
- build library
bash ./hack/build-via-docker.sh
# use env
export LD_PRELOAD=/path/to/libvcuda-hook.so
export VCUDA_LOG_LEVEL=debug
export VCUDA_MEMORY_LIMIT=(1024 * 1024 * 1024 * 10) // limit 10G
# manual
your_application
# or use docker
docker run -it --gpus all --rm -v /path/to/libvcuda-hook.so:/usr/lib64/libvcuda-hook.so -e LD_PRELOAD=/usr/lib64/libvcuda-hook.so vllm/vllm-openai:latest bash
- ✅ Minimal Performance Overhead
- ✅ Fractional GPU Usage
- ✅ Fine-grained GPU Memory Control
- ✅ Multi‑Process GPU Memory Unified Control
- ✅ Container GPU Sharing
- ☐ Kubernetes Support
- ...
- ☐ Remote GPU Call Over Network
- ☐ Oversub GPU Memory Control
- ☐ GPU Task Hot Snapshot
- ...
Based on several core motivations, I developed this project:
- Personal Technical Interest and Professional Needs: Driven by interest in GPU virtualization technology and CUDA programming, along with related requirements encountered in practical work
- Open Architecture: Provide an open-source solution that allows the community to participate in improvements and feature extensions
- High Scalability: Design a flexible architecture that supports various GPU virtualization scenarios, including GPU resource sharing in containerized environments
- Dynamic Controllability: Implement runtime dynamic configuration and management capabilities, allowing GPU resource allocation adjustments based on demand
- Transparent Proxy Layer: Serve as a transparent proxy for CUDA dynamic libraries, enabling GPU virtualization functionality without modifying existing applications
This project aims to provide a simple and easy-to-use GPU virtualization solution for containerized environments, enabling safe and efficient sharing of GPU resources among multiple containers.