Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Bug]: [Kubernetes] Incorrect DSTACK_GPUS_NUM/DSTACK_GPUS_PER_NODE when GPU range specified #3468

@un-def

Description

@un-def

Steps to reproduce

Given a cluster with N GPUs per node:

$ dstack offer --cpu 0.. --memory 0.. --disk 0.. --gpu 1.. --backend kubernetes

 #  BACKEND         RESOURCES                                 INSTANCE TYPE                       PRICE
 1  kubernetes (-)  cpu=127 mem=1574GB disk=82GB H100:80GB:8  computeinstance-e00ks8pzq59e6a4aqg  $0
 2  kubernetes (-)  cpu=127 mem=1574GB disk=82GB H100:80GB:8  computeinstance-e00qhgbdeza93kq1aj  $0

Start a run with a GPU range M..<any> where M < N:

type: dev-environment
name: dev-environment
ide: vscode

resources:
  gpu: 4..8

Check nvidia-smi and DSTACK_GPUS_NUM/DSTACK_GPUS_PER_NODE:

$ ssh dev-environment
# nvidia-smi -L
GPU 0: NVIDIA H100 80GB HBM3 (UUID: GPU-f05caec9-a40e-9d3e-7d40-146a037fada0)
GPU 1: NVIDIA H100 80GB HBM3 (UUID: GPU-5c60ccfc-e289-7d91-6fba-56bf56c128b7)
GPU 2: NVIDIA H100 80GB HBM3 (UUID: GPU-1795d012-3487-9e01-9e9a-611913d6d945)
GPU 3: NVIDIA H100 80GB HBM3 (UUID: GPU-8ccd0ca6-b808-93b7-f0bb-bce6e05b5615)
# echo $DSTACK_GPUS_NUM
8

Actual behaviour

No response

Expected behaviour

No response

dstack version

0.20.3

Server logs

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions