Thanks to visit codestin.com
Credit goes to github.com

Skip to content

0.19.31

Choose a tag to compare

@un-def un-def released this 02 Oct 12:47
· 201 commits to master since this release
6201c2f

Kubernetes

The kubernetes backend introduces many significant improvements and has now graduated from alpha to beta. It is much more stable and can be reliably used on GPU clusters for all kinds of workloads, including distributed tasks.

Here's what changed:

  • Resource allocation now fully respects the user’s resources specification. Previously, it ignored certain aspects, especially the proper selection of GPU labels according to the specified gpu spec.
  • Distributed tasks now fully work on Kubernetes clusters with fast interconnect enabled. Previously, this caused many issues.
  • Added support privileged.

We’ve also published a dedicated guide on how to get started with dstack on Kubernetes, highlighting important nuances.

Warning

Be aware of breaking changes if you used the kubernetes backend before. The following properties in the Kubernetes backend configuration have been renamed:

  • networkingproxy_jump
  • ssh_hosthostname
  • ssh_portport

Additionally, the "proxy jump" pod and service names now include a dstack- prefix.

GCP

A4 spot instances with B200 GPUs

The gcp backend now supports A4 spot instances equipped with B200 GPUs. This includes provisioning both standalone A4 instances and A4 clusters with high-performance RoCE networking.

To use A4 clusters with high-performance networking, you must configure multiple VPCs in your backend settings (~/.dstack/server/config.yml):

projects:
- name: main
  backends:
  - type: gcp
    project_id: my-project
    creds:
      type: default
    vpc_name: my-vpc-0   # regular, 1 subnet
    extra_vpcs:
    - my-vpc-1   # regular, 1 subnet
    roce_vpcs:
    - my-vpc-mrdma   # RoCE profile, 8 subnets

Then, provision a cluster using a fleet configuration:

type: fleet

nodes: 2
placement: cluster

availability_zones: [us-west2-c]
backends: [gcp]

spot_policy: spot

resources:
  gpu: B200:8

Each instance in the cluster will have 10 network interfaces: 1 regular interface in the main VPC, 1 regular interface in the extra VPC, and 8 RDMA interfaces in the RoCE VPC.

Note

Currently, the gcp backend only supports A4 spot instances. Support for other options, such as flex and calendar scheduling via Dynamic Workload Scheduler, is coming soon.

CLI

dstack project is now faster

The USER column in dstack project list is now shown only when the --verbose flag is used.
This significantly improves performance for users with many configured projects, reducing execution time from ~20 seconds to as little as 2 seconds in some cases.

What's changed

  • [Kubernetes] Request resources according to RequirementsSpec by @un-def in #3127
  • [GCP] Support A4 spot instances with the B200 GPU by @jvstme in #3100
  • [CLI] Move USER to dstack project list --verbose by @jvstme in #3134
  • [Kubernetes] Configure /dev/shm if requested by @un-def in #3135
  • [Backward incompatible] Rename properties in Kubernetes backend config by @un-def in #3137
  • Support GCP A4 clusters by @jvstme in #3142
  • Kubernetes: add multi-node support by @un-def in #3141
  • Fix duplicate server log messages by @jvstme in #3143
  • [Docs] Improve Kubernetes documentation by @peterschmidt85 in #3138

Full changelog: 0.19.30...0.19.31