Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Detect vendor in cdi specs to generate deviceIDs for --gpus#12839

Merged
mxpv merged 3 commits intocontainerd:mainfrom
shiv-tyagi:vendor-discovery-gpus
Feb 20, 2026
Merged

Detect vendor in cdi specs to generate deviceIDs for --gpus#12839
mxpv merged 3 commits intocontainerd:mainfrom
shiv-tyagi:vendor-discovery-gpus

Conversation

@shiv-tyagi
Copy link
Contributor

@shiv-tyagi shiv-tyagi commented Jan 30, 2026

This adds GPU vendor auto-detection from CDI specs instead of hardcoding nvidia.com. This allows the --gpus flag to work with both NVIDIA and AMD GPUs by detecting the vendor from available CDI spec files.

Here is a similar PR for nerdctl containerd/nerdctl#4728

@shiv-tyagi
Copy link
Contributor Author

The CI failures don't seem to be related to this. Can someone retrigger them for me?

@shiv-tyagi shiv-tyagi force-pushed the vendor-discovery-gpus branch from ba865ed to 9605e22 Compare February 2, 2026 17:54
@thaJeztah
Copy link
Member

cc @elezar @vvoland (in case similar changes are needed in moby to keep consistency)

@elezar
Copy link
Contributor

elezar commented Feb 2, 2026

Thanks @thaJeztah. I'll have a look tomorrow

@shiv-tyagi
Copy link
Contributor Author

Thank you so much @elezar. I have raised a similar PR for podman as well (containers/podman#28008).

@shiv-tyagi shiv-tyagi force-pushed the vendor-discovery-gpus branch from 9605e22 to 6f6f9e5 Compare February 3, 2026 16:37
@shiv-tyagi shiv-tyagi requested a review from elezar February 3, 2026 16:42
@shiv-tyagi shiv-tyagi force-pushed the vendor-discovery-gpus branch from 6f6f9e5 to 0b88a39 Compare February 4, 2026 12:42
@shiv-tyagi
Copy link
Contributor Author

shiv-tyagi commented Feb 4, 2026

I have also added tests for WithGPUs and WithCDIDevices to ensure the behavior is inline. I have added a dedicated test to test some scenarios where both --gpus and --device options are used.

@shiv-tyagi
Copy link
Contributor Author

@thaJeztah @elezar Can you please review this and let me know if something is pending from my side to move this further?

Thanks!

Copy link
Contributor

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the aproach looks good, but we could incorportate some of the feedback from the podman PR (containers/podman#28008) and also ensure that the changes are more localized since they only affect the run --gpus flag.

@shiv-tyagi
Copy link
Contributor Author

shiv-tyagi commented Feb 12, 2026

Thanks for the review @elezar. I have addressed your comments along with adding unit tests. Please take a look.

@shiv-tyagi
Copy link
Contributor Author

shiv-tyagi commented Feb 16, 2026

cc @elezar @vvoland (in case similar changes are needed in moby to keep consistency)

I have submitted moby/moby#52048. PTAL whenever you get time. Thanks in advance!

cc @elezar

@shiv-tyagi
Copy link
Contributor Author

I think the aproach looks good, but we could incorportate some of the feedback from the podman PR (containers/podman#28008) and also ensure that the changes are more localized since they only affect the run --gpus flag.

Hey @elezar, I have addressed your comments. Can you please check if it looks good now?

If there is anything else, I will be more than happy to improve.

Thanks!

@shiv-tyagi
Copy link
Contributor Author

shiv-tyagi commented Feb 17, 2026

Copy link
Contributor

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@shiv-tyagi
Copy link
Contributor Author

shiv-tyagi commented Feb 18, 2026

/cc @mxpv

Can you please review this?

@k8s-ci-robot k8s-ci-robot requested a review from mxpv February 18, 2026 06:19
Copy link
Member

@mxpv mxpv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
I left a few non-blocking nits to make things a bit cleaner.

}

modifyingOption := func(ctx context.Context, client oci.Client, c *containers.Container, s *oci.Spec) error {
devices, err := gpuDeviceNames(ctx, cdi.GetDefaultCache(), gpuIDs...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think we need to carry vendorLister interface everywhere.
You could do cdi.GetDefaultCache().ListVendors() here and pass []string.
This way you don't have to check lister == nil as it's safe to iterate over nil slice in Go and no need in mocking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I have fixed this. It looks better now.

// NVIDIA is checked first followed by AMD
for _, known := range knownVendors {
for _, available := range availableVendors {
if available == known {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can use slices package to slightly reduce number of branches.

  for _, known := range knownVendors {
      if slices.Contains(availableVendors, known) {
          log.G(ctx).Debugf("Detected GPU vendor from CDI specs: %s", known)
          return known, nil
      }
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

@mikebrow mikebrow force-pushed the vendor-discovery-gpus branch from 387c987 to df27d17 Compare February 19, 2026 00:15
Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
agree with suggestion to consider config in a follow-up

@shiv-tyagi
Copy link
Contributor Author

I have applied your suggestions @mxpv. Please review and approve the PR if it looks good now.

@elezar @mikebrow @mxpv Thanks again everyone for reviewing this PR. <3

@shiv-tyagi shiv-tyagi force-pushed the vendor-discovery-gpus branch from 58be393 to f9c19dd Compare February 19, 2026 08:54
This adds GPU vendor auto-detection from CDI specs instead of hardcoding nvidia.com.
This allows the --gpus flag to work with both NVIDIA and AMD GPUs by detecting the vendor from available CDI spec files.

Signed-off-by: Shiv Tyagi <[email protected]>
@mikebrow mikebrow force-pushed the vendor-discovery-gpus branch from f9c19dd to 090def0 Compare February 19, 2026 15:28
@github-project-automation github-project-automation bot moved this from Needs Triage to Review In Progress in Pull Request Review Feb 20, 2026
@mxpv mxpv added this pull request to the merge queue Feb 20, 2026
Merged via the queue into containerd:main with commit f4681e0 Feb 20, 2026
94 of 102 checks passed
@github-project-automation github-project-automation bot moved this from Review In Progress to Done in Pull Request Review Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Development

Successfully merging this pull request may close these issues.

6 participants