Detect vendor in cdi specs to generate deviceIDs for --gpus#12839
Detect vendor in cdi specs to generate deviceIDs for --gpus#12839mxpv merged 3 commits intocontainerd:mainfrom
Conversation
|
The CI failures don't seem to be related to this. Can someone retrigger them for me? |
ba865ed to
9605e22
Compare
|
Thanks @thaJeztah. I'll have a look tomorrow |
|
Thank you so much @elezar. I have raised a similar PR for podman as well (containers/podman#28008). |
9605e22 to
6f6f9e5
Compare
6f6f9e5 to
0b88a39
Compare
|
I have also added tests for |
|
@thaJeztah @elezar Can you please review this and let me know if something is pending from my side to move this further? Thanks! |
elezar
left a comment
There was a problem hiding this comment.
I think the aproach looks good, but we could incorportate some of the feedback from the podman PR (containers/podman#28008) and also ensure that the changes are more localized since they only affect the run --gpus flag.
0b88a39 to
bde37cf
Compare
bde37cf to
1439663
Compare
1439663 to
5a0b36b
Compare
|
Thanks for the review @elezar. I have addressed your comments along with adding unit tests. Please take a look. |
5a0b36b to
387c987
Compare
I have submitted moby/moby#52048. PTAL whenever you get time. Thanks in advance! cc @elezar |
Hey @elezar, I have addressed your comments. Can you please check if it looks good now? If there is anything else, I will be more than happy to improve. Thanks! |
|
/cc @mxpv Can you please review this? |
mxpv
left a comment
There was a problem hiding this comment.
Looks good.
I left a few non-blocking nits to make things a bit cleaner.
cmd/ctr/commands/run/run_unix.go
Outdated
| } | ||
|
|
||
| modifyingOption := func(ctx context.Context, client oci.Client, c *containers.Container, s *oci.Spec) error { | ||
| devices, err := gpuDeviceNames(ctx, cdi.GetDefaultCache(), gpuIDs...) |
There was a problem hiding this comment.
nit: I don't think we need to carry vendorLister interface everywhere.
You could do cdi.GetDefaultCache().ListVendors() here and pass []string.
This way you don't have to check lister == nil as it's safe to iterate over nil slice in Go and no need in mocking.
There was a problem hiding this comment.
Sure. I have fixed this. It looks better now.
cmd/ctr/commands/run/run_unix.go
Outdated
| // NVIDIA is checked first followed by AMD | ||
| for _, known := range knownVendors { | ||
| for _, available := range availableVendors { | ||
| if available == known { |
There was a problem hiding this comment.
nit: can use slices package to slightly reduce number of branches.
for _, known := range knownVendors {
if slices.Contains(availableVendors, known) {
log.G(ctx).Debugf("Detected GPU vendor from CDI specs: %s", known)
return known, nil
}
}387c987 to
df27d17
Compare
mikebrow
left a comment
There was a problem hiding this comment.
LGTM
agree with suggestion to consider config in a follow-up
58be393 to
f9c19dd
Compare
This adds GPU vendor auto-detection from CDI specs instead of hardcoding nvidia.com. This allows the --gpus flag to work with both NVIDIA and AMD GPUs by detecting the vendor from available CDI spec files. Signed-off-by: Shiv Tyagi <[email protected]>
Signed-off-by: Shiv Tyagi <[email protected]>
Signed-off-by: Shiv Tyagi <[email protected]>
f9c19dd to
090def0
Compare
This adds GPU vendor auto-detection from CDI specs instead of hardcoding nvidia.com. This allows the --gpus flag to work with both NVIDIA and AMD GPUs by detecting the vendor from available CDI spec files.
Here is a similar PR for nerdctl containerd/nerdctl#4728