Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@elezar
Copy link
Member

@elezar elezar commented May 26, 2025

This change switches to using JIT CDI spec generation in the NVIDIA Contianer Runtime instead of the legacy injection mechanism.

This change switches to using JIT CDI spec generation for the
injection of devices into kind nodes.

Signed-off-by: Evan Lezar <[email protected]>
@elezar elezar force-pushed the use-runtime-cdi branch from eccab68 to 8491ea7 Compare May 28, 2025 08:22
@elezar elezar self-assigned this May 28, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR switches the NVIDIA Container Runtime from legacy device injection to JIT (Just-In-Time) CDI (Container Device Interface) spec generation by default. The change modifies mount paths to use the CDI format and updates the parsing logic to handle both legacy and CDI device specifications.

Key changes:

  • Updates container mount paths to use CDI format (/var/run/nvidia-container-devices/cdi/runtime.nvidia.com/gpu/)
  • Modifies device parsing logic to support both legacy and CDI device specifications
  • Applies changes consistently across default configuration template and examples

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
pkg/nvkind/node.go Updates device parsing logic to handle both legacy and CDI mount paths
pkg/nvkind/default-config-template.yaml Changes default container path to use CDI format
examples/one-worker-per-gpu.yaml Updates example to use CDI format
examples/equally-distributed-gpus.yaml Updates example to use CDI format

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

deviceID := filepath.Base(mount.ContainerPath)
switch {
case strings.HasPrefix(dir, "/var/run/nvidia-container-devices/cdi"):
cdiKind := strings.TrimPrefix(dir, "/var/run/nvidia-container-devices/cdi/")
Copy link

Copilot AI Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable cdiKind extracts the CDI kind from the directory path, but this appears to be incorrect. Based on the container path format /var/run/nvidia-container-devices/cdi/runtime.nvidia.com/gpu/, the cdiKind should be runtime.nvidia.com/gpu, not just the remainder after trimming the prefix. This will result in incorrect device identifiers being generated.

Suggested change
cdiKind := strings.TrimPrefix(dir, "/var/run/nvidia-container-devices/cdi/")
// Robustly extract the CDI kind as the two path components after the CDI base path
cdiBase := "/var/run/nvidia-container-devices/cdi/"
relPath := strings.TrimPrefix(dir, cdiBase)
relPath = strings.TrimPrefix(relPath, "/") // handle possible missing trailing slash
parts := strings.Split(relPath, string(filepath.Separator))
cdiKind := ""
if len(parts) >= 2 {
cdiKind = parts[0] + "/" + parts[1]
} else if len(parts) == 1 {
cdiKind = parts[0]
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant