Simulate NVIDIA or AMD (ROCm) GPUs in a Kubernetes in Docker (kind) cluster, without requiring actual GPU hardware.
This is perfect for:
- Testing GPU scheduling
- Validating device plugin behavior
- Learning how GPU workloads interact with Kubernetes
- Building GPU-related Kubernetes infrastructure (where no real workloads are required).
This project simulates the presence of GPU resources in a Kind cluster. It does not provide access to actual GPU hardware, and real GPU workloads (like CUDA or ROCm kernels) will not run.
Make sure the following tools are installed on your system before running the GPU simulator script:
Tool | Purpose |
---|---|
docker OR podman | Required by kind , runs the local registry and all cluster nodes |
kind | Creates the local Kubernetes cluster inside Docker |
kubectl | CLI to interact with the Kubernetes cluster |
git | Clones the GPU device plugin repositories (NVIDIA / ROCm) |
sed | Used to patch Dockerfiles for public registry compatibility |
- Kind cluster with 1 control-plane + 2 workers
- Simulated
amd.com/gpu
ornvidia.com/gpu
resources - Automatically taints and labels GPU nodes
- Uses a local container registry
- Builds and deploys the AMD ROCm device plugin (locally)
- Builds and deploys NVIDIA plugin (locally)
- Includes GPU test pod manifests
chmod +x kind-gpu-sim.sh
Choose your simulation type:
# Simulate AMD GPUs
./kind-gpu-sim.sh create rocm
# Simulate NVIDIA GPUs
./kind-gpu-sim.sh create nvidia
Create a pod that requests GPU resources:
kubectl create -f pods/nvidia-gpu-test-pod.yaml
Check pod logs
kubectl logs nvidia-gpu-test
Hello from fake NVIDIA GPU node
kubectl create -f pods/rocm-gpu-test-pod.yaml
Check pod logs
kubectl logs gpu-rocm-test
Hello from fake ROCm GPU node
./kind-gpu-sim.sh delete
.
├── kind-gpu-config.yaml # Kind cluster config: 1 control-plane, 2 workers
├── kind-gpu-sim.sh # Main script to create/delete simulated GPU clusters (ROCm or NVIDIA)
├── pods
│ ├── nvidia-gpu-test-pod.yaml # Pod spec to test NVIDIA GPU simulation (uses nvidia.com/gpu)
│ ├── rocm-gpu-test-pod.yaml # Pod spec to test AMD ROCm GPU simulation (uses amd.com/gpu)
│ └── triton-pod.yaml # Pod that installs and runs Triton-lang, useful for simulating kernel compilation
└── Readme.md # Project overview and usage instructions
Component | Description |
---|---|
kubectl patch |
Fakes amd.com/gpu or nvidia.com/gpu on nodes |
taint + toleration |
Ensures only GPU workloads land on simulated nodes |
DaemonSet |
Deploys either AMD or NVIDIA device plugin DaemonSets |
localhost:5000 |
Local registry, connected to Kind |
- kind v0.23.0
This project helps:
- Devs test GPU workloads without expensive hardware
- CI environments validate GPU scheduling logic
- Anyone learn Kubernetes GPU primitives
# ./kind-gpu-sim.sh load --image-name=<Image-Name> --cluster-name=<KIND_CLUSTER_NAME)>
# for e.g.
./kind-gpu-sim.sh load --image-name=public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v0.9.1