-
Notifications
You must be signed in to change notification settings - Fork 23
Use JIT CDI spec generation by default #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This change switches to using JIT CDI spec generation for the injection of devices into kind nodes. Signed-off-by: Evan Lezar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR switches the NVIDIA Container Runtime from legacy device injection to JIT (Just-In-Time) CDI (Container Device Interface) spec generation by default. The change modifies mount paths to use the CDI format and updates the parsing logic to handle both legacy and CDI device specifications.
Key changes:
- Updates container mount paths to use CDI format (
/var/run/nvidia-container-devices/cdi/runtime.nvidia.com/gpu/) - Modifies device parsing logic to support both legacy and CDI device specifications
- Applies changes consistently across default configuration template and examples
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| pkg/nvkind/node.go | Updates device parsing logic to handle both legacy and CDI mount paths |
| pkg/nvkind/default-config-template.yaml | Changes default container path to use CDI format |
| examples/one-worker-per-gpu.yaml | Updates example to use CDI format |
| examples/equally-distributed-gpus.yaml | Updates example to use CDI format |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
| deviceID := filepath.Base(mount.ContainerPath) | ||
| switch { | ||
| case strings.HasPrefix(dir, "/var/run/nvidia-container-devices/cdi"): | ||
| cdiKind := strings.TrimPrefix(dir, "/var/run/nvidia-container-devices/cdi/") |
Copilot
AI
Aug 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable cdiKind extracts the CDI kind from the directory path, but this appears to be incorrect. Based on the container path format /var/run/nvidia-container-devices/cdi/runtime.nvidia.com/gpu/, the cdiKind should be runtime.nvidia.com/gpu, not just the remainder after trimming the prefix. This will result in incorrect device identifiers being generated.
| cdiKind := strings.TrimPrefix(dir, "/var/run/nvidia-container-devices/cdi/") | |
| // Robustly extract the CDI kind as the two path components after the CDI base path | |
| cdiBase := "/var/run/nvidia-container-devices/cdi/" | |
| relPath := strings.TrimPrefix(dir, cdiBase) | |
| relPath = strings.TrimPrefix(relPath, "/") // handle possible missing trailing slash | |
| parts := strings.Split(relPath, string(filepath.Separator)) | |
| cdiKind := "" | |
| if len(parts) >= 2 { | |
| cdiKind = parts[0] + "/" + parts[1] | |
| } else if len(parts) == 1 { | |
| cdiKind = parts[0] | |
| } |
This change switches to using JIT CDI spec generation in the NVIDIA Contianer Runtime instead of the legacy injection mechanism.