Codestin Search App

@jvstme

Fleets

Nodes

Maximum number of nodes

The fleet nodes.max property is now respected that allows limiting maximum number of instances allowed in a fleet. For example, to allow at most 10 instances in the fleet, you can do:

type: fleet
name: cloud-fleet
nodes: 0..10

A fleet will be considered for a run only if the run can fit into the fleet without violating nodes.max. If you don't need to enforce an upper limit, you can omit it:

type: fleet
name: cloud-fleet
nodes: 0..

Backends

Nebius

Credentials file

It's also possible to configure the nebius backend using a credentials file generated by the nebius CLI:

nebius iam auth-public-key generate \
    --service-account-id <service account ID> \
    --output ~/.nebius/sa-credentials.json

projects:
- name: main
  backends:
  - type: nebius
    creds:
      type: service_account
      filename: ~/.nebius/sa-credentials.json

Hot Aisle

Hot Aisle backend now supports multi-GPU VMs such as 2xMI300X and 4xMI300X.

dstack apply -f .local/.dstack.yml --gpu amd:2
The working_dir is not set — using legacy default "/workflow". Future versions will default to the
image's working directory.

 #  BACKEND               RESOURCES                                 INSTANCE TYPE        PRICE
 1  hotaisle              cpu=26 mem=448GB disk=12288GB             2x MI300X 26x Xeon…  $3.98
    (us-michigan-1)       MI300X:192GB:2

What's changed

Fix CLI compatibility with server 0.19.11 by @jvstme in #3145
[Feature]: Nebius switch to using nebius iam auth-public-key generate by @peterschmidt85 in #3147
[Docs] Move Plugins to Reference | Python API by @peterschmidt85 in #3148
404 error on GIT url by @robinnarsinghranabhat in #3149
Fix idle duration: off and forbid negative durations by @r4victor in #3151
[Docs]: GCP A4 cluster example by @jvstme in #3152
Consider multinode replica inactive only if all jobs done by @r4victor in #3157
Kubernetes: add NVIDIA GPU toleration by @un-def in #3160
[Nebius] Support tags by @peterschmidt85 in #3158
[Hot Aisle] Support multi-GPU VMs by @peterschmidt85 in #3154
feat(docker): upgrade litestream to v0.5.0 by @DragonStuff in #3165
[Blog] Orchestrating GPU workloads on Kubernetes by @peterschmidt85 in #3161
Respect fleet nodes.max by @r4victor in #3164
Fix kubeconfig via data reference by @r4victor in #3170
[Docs] Fix kubernetes typos by @svanzoest in #3169

New contributors

@robinnarsinghranabhat made their first contribution in #3149
@DragonStuff made their first contribution in #3165
@svanzoest made their first contribution in #3169

Full changelog: 0.19.31...0.19.32

@un-def

Kubernetes

The kubernetes backend introduces many significant improvements and has now graduated from alpha to beta. It is much more stable and can be reliably used on GPU clusters for all kinds of workloads, including distributed tasks.

Here's what changed:

Resource allocation now fully respects the user’s resources specification. Previously, it ignored certain aspects, especially the proper selection of GPU labels according to the specified gpu spec.
Distributed tasks now fully work on Kubernetes clusters with fast interconnect enabled. Previously, this caused many issues.
Added support privileged.

We’ve also published a dedicated guide on how to get started with dstack on Kubernetes, highlighting important nuances.

Warning

Be aware of breaking changes if you used the kubernetes backend before. The following properties in the Kubernetes backend configuration have been renamed:

networking → proxy_jump
ssh_host → hostname
ssh_port → port

Additionally, the "proxy jump" pod and service names now include a dstack- prefix.

GCP

A4 spot instances with B200 GPUs

The gcp backend now supports A4 spot instances equipped with B200 GPUs. This includes provisioning both standalone A4 instances and A4 clusters with high-performance RoCE networking.

To use A4 clusters with high-performance networking, you must configure multiple VPCs in your backend settings (~/.dstack/server/config.yml):

projects:
- name: main
  backends:
  - type: gcp
    project_id: my-project
    creds:
      type: default
    vpc_name: my-vpc-0   # regular, 1 subnet
    extra_vpcs:
    - my-vpc-1   # regular, 1 subnet
    roce_vpcs:
    - my-vpc-mrdma   # RoCE profile, 8 subnets

Then, provision a cluster using a fleet configuration:

type: fleet

nodes: 2
placement: cluster

availability_zones: [us-west2-c]
backends: [gcp]

spot_policy: spot

resources:
  gpu: B200:8

Each instance in the cluster will have 10 network interfaces: 1 regular interface in the main VPC, 1 regular interface in the extra VPC, and 8 RDMA interfaces in the RoCE VPC.

Note

Currently, the gcp backend only supports A4 spot instances. Support for other options, such as flex and calendar scheduling via Dynamic Workload Scheduler, is coming soon.

CLI

`dstack project` is now faster

The USER column in dstack project list is now shown only when the --verbose flag is used.
This significantly improves performance for users with many configured projects, reducing execution time from ~20 seconds to as little as 2 seconds in some cases.

What's changed

[Kubernetes] Request resources according to RequirementsSpec by @un-def in #3127
[GCP] Support A4 spot instances with the B200 GPU by @jvstme in #3100
[CLI] Move USER to dstack project list --verbose by @jvstme in #3134
[Kubernetes] Configure /dev/shm if requested by @un-def in #3135
[Backward incompatible] Rename properties in Kubernetes backend config by @un-def in #3137
Support GCP A4 clusters by @jvstme in #3142
Kubernetes: add multi-node support by @un-def in #3141
Fix duplicate server log messages by @jvstme in #3143
[Docs] Improve Kubernetes documentation by @peterschmidt85 in #3138

Full changelog: 0.19.30...0.19.31

@jvstme

Major changes

[Feature] Update CUDA driver in dstack's default aws, gcp, azure, and oci OS images from 535 to 570 by @jvstme in #3099

Major bug-fixes

[Bug] dstack CLI logging is broken #3118 by @peterschmidt85 in #3119
[AWS]: dstack doesn't use the EFA-enabled Docker image for H100:1 on AWS (p5.4xlarge) by @r4victor in #3111
[Bug] dstack misconfigures Git credentials for private repos by @un-def in #3116

Other changes

Fix fleet provisioning error message when fleet retried by @r4victor in #3109
Use fleet-combined idle_duration on run apply by @r4victor in #3110
Skip runner integration tests on macOS in CI by @r4victor in #3112
[Bug]: dstack offer CLI grouped by GPU output as JSON fails #3120 by @peterschmidt85 in #3122

Full changelog: 0.19.29...0.19.30

@peterschmidt85

Fleets

Over the last few releases, we’ve been reworking how fleets work to radically simplify management and make it fully declarative.

Previously, you had to specify a fleet via fleets explicitly — otherwise, dstack always created a new one. Now, dstack automatically picks an existing fleet if it fits the requirements, creating a new one only when needed.

For more on the fleet roadmap, see this meta issue.

User Interface

Grouping offers by backend

The Offers page in the UI now lets you group available offers by backend, making it easier to compare options across cloud providers.

Breaking changes

The tensordock backend hasn’t worked for a long time (due to the API it relied on being deprecated) and has now been removed.

What's changed

[Blog] The state of cloud GPUs in 2025: costs, performance, playbooks by @peterschmidt85 in #3089
[Docs] Add .dstack/profiles.yml to Reference and Protips by @peterschmidt85 in #3093
[TensorDock] Remove the tensordock from supported backends #3092 by @peterschmidt85 in #3094
Unassign scheduled run from fleet on resubmission by @r4victor in #3096
[UI] Allow to group offers by backend by @olgenn in #3098
Implement requirements-independent offers cache by @r4victor in #3091
[Internal] Project config support by @peterschmidt85 in #3097
Fix long sqlite write transaction when provisioning instances by @r4victor in #3104
Consider backend offers when choosing optimal fleet by @r4victor in #3101
[UI] Project wizard by @olgenn in #3103
Use Cuda 12.0 image for DataCrunch A6000 by @r4victor in #3105
[UI] Project wizard #323 by @olgenn in #3107

Full changelog: 0.19.28...0.19.29

@peterschmidt85

CLI

Argument Handling

The CLI now properly handles unrecognized arguments and rejects them with clear error messages. The ${{ run.args }} interpolation for tasks and services is still supported but now requires the -- pseudo-argument separator:

dstack apply --reuse -- --some=arg --some-option

This change prevents accidental typos in command arguments from being silently ignored.

What's Changed

[Blog] Orchestrating GPUs on DigitalOcean and AMD Developer Cloud by @peterschmidt85 in #3075
Forbid deleting projects with active resources by @r4victor in #3079
Add a script to automatically generate expanded release notes using an LLM by @r4victor in #3080
[UI] Reworked dstack Sky sign-up page by @peterschmidt85 in #3081
[UI] Minor styling changes by @peterschmidt85 in #3082
Generate unique fleet name for autocreated fleets by @r4victor in #3085
Exclude current_resource.fleet by @r4victor in #3087
[CLI] Handle unrecognized arguments by @un-def in #3076
Add a new opt-in job network mode by @un-def in #3043
Generate CoreModel dynamically when using custom configs by @r4victor in #3083

Full Changelog: 0.19.27...0.19.28

@Bihan

Run configurations

Repo directory

It's now possible to specify the directory in the container where the repo is mounted:

type: dev-environment

ide: vscode

repos:
  - local_path: .
    path: my_repo

  # or using short syntax:
  # - .:my_repo

The path property can be an absolute path or a relative path (with respect to working_dir). It's available inside run as the $DSTACK_REPO_DIR environment variable. If path is not set, the /workflow path is used.

Working directory

Previously, the working_dir property had complicated semantics: it defaulted to the repo path (/workflow), but for tasks and services without commands, the image working directory was used. You could also specify custom working_dir relative to the repo directory. This is now reversed: you specify working_dir as absolute path, and the repo path can be specified relative to it.

Note

During transitioning period, the legacy behavior of using /workflow is preserved if working_dir is not set. In future releases, this will be simplified, and working_dir will always default to the image working directory.

Fleet configuration

Nodes, retry, and target

dstack now indefinitely maintains nodes.min specified for cloud fleets. If instances get terminated for any reason and there are fewer instances than nodes.min, dstack will provision new fleet instances in the background.

There is also a new nodes.target property that specifies the number of instances to provision on fleet apply. Since now nodes.min is always maintained, you may specify nodes.target different from nodes.min to provision more instances than needs to be maintained.

Example:

type: fleet
name: default-fleet
nodes:
  min: 1 # Maintain one instance
  target: 2 # Provision two instances initially
  max: 3

dstack will provision two instances. After deleting one instance, there will be one instances left. Deleting the last instance will trigger dstack to re-create the instance.

Offers

The UI now has a dedicated page showing GPU offers available across all configured backends.

Digital Ocean and AMD Developer Cloud

The release adds native integration with DigitalOcean and
AMD Developer Cloud.

A backend configuration example:

projects:
- name: main
  backends:
  - type: amddevcloud
    project_name: TestProject
    creds:
        type: api_key
        api_key: ...

For DigitalOcean, set type to digitalocean.

The digitalocean and amddevcloud backends support NVIDIA and AMD GPU VMs, respectively, and allow you to run
dev environments (interactive development), tasks
(training, fine-tuning, or other batch jobs), and services (inference).

Security

Important

This update fixes a vulnerability in the cloudrift, cudo, and datacrunch backends. Instances created with earlier dstack versions lack proper firewall rules, potentially exposing internal APIs and allowing unauthorized access.

Users of these backends are advised to update to the latest version and re-create any running instances.

What's changed

Minor Hot Aisle Cleanup by @Bihan in #2978
UI for offers #3004 by @olgenn in #3042
Add repos[].path property by @un-def in #3041
style(frontend): Add missing final newline by @un-def in #3044
Implement fleet state-spec consolidation to maintain nodes.min by @r4victor in #3047
Add digital ocean and amd dev backend by @Bihan in #3030
test: include amddevcloud and digitalocean in backend types by @Bihan in #3053
Fix missing digitaloceanbase configurator methods by @Bihan in #3055
Expose job working dir via environment variable by @un-def in #3049
[runner] Ensure working_dir exists by @un-def in #3052
Fix server compatibility with pre-0.19.27 runners by @un-def in #3054
Bind shim and exposed container ports to localhost by @jvstme in #3057
Fix client compatibility with pre-0.19.27 servers by @un-def in #3063
[Docs] Reflect the repo and working directory changes (#3041) by @peterschmidt85 in #3064
Show a CLI warning when using autocreated fleets by @r4victor in #3060
Improve UX with private repos by @un-def in #3065
Set up instance-level firewall on all backends by @jvstme in #3058
Exclude target when equal to min for responses by @r4victor in #3070
[Docs] Shorten the default working_dir warning by @peterschmidt85 in #3072
Do not issue empty update for deleted_fleets_placement_groups by @r4victor in #3071
Exclude target when equal to min for responses (attempt 2) by @r4victor in #3074

Full changelog: 0.19.26...0.19.27

@r4victor

Repos

Previously, dstack always required running the dstack init command before use. This also meant that dstack would always mount the current folder as a repo.

With this update, repo configuration is now explicit and declarative. If you want to use a repo in your run, you must specify it with the new repos property. The dstack init command is now only used to provide custom Git credentials when working with private repos.

For example, imagine you have a cloned Git repo with an examples subdirectory containing a .dstack.yml file:

type: dev-environment
name: vscode    

repos:
  # Mounts the parent directory of `examples` (must be a Git repo)
  #   to `/workflow` (the default working directory)
  - ..

ide: vscode

When you run this configuration, dstack fetches the repo on the instance, applies your local changes, and mounts it—so the container always matches your local repo.

Sometimes you may want to mount a Git repo without cloning it locally. In that case, simply provide a URL in repos:

type: dev-environment
name: vscode    

repos:
  # Clone the specified repo to `/workflow` (the default working directory)
  - https://github.com/dstackai/dstack

ide: vscode

If the repo is private, dstack will automatically try to use your default Git credentials (from ~/.ssh/config or ~/.config/gh/hosts.yml).

To configure custom Git credentials, use dstack init.

Note

If you previously initialized a repo via dstack init, it will still be mounted. Be sure to migrate to repos, as implicitly configured repos are deprecated and will stop working in future releases.

If you no longer want to use the implicitly configured repo, run dstack init --remove.

Note

Currently, you can configure only one repo per run configuration.

Fleets

Previously, when dstack added new instances to existing fleets, it ignored the fleet configuration and used only the run configuration for which the instance was created. This could result in fleets containing instances that didn’t match their configuration.

This has now been fixed: fleet configurations and run configurations are intersected so that provisioned instances respect both. For example, given a fleet configuration:

type: fleet
name: cloud-fleet
placement: any
nodes: 0..2
backends:
  - runpod

and a run configuration:

type: dev-environment
ide: vscode
spot_policy: spot
fleets:
  - cloud-fleet

dstack will provision a RunPod spot instance in cloud-fleet.

This change lets you define main provisioning parameters in fleet configurations, while adjusting them in run configurations as needed.

Note

Currently, the run plan does not take fleet configuration into account when showing offers, since the target fleet may not be known beforehand. We plan to improve this by showing offers for all candidate fleets.

Examples

Wan2.2

We've added a new example demonstrating how to use Wan2.2, the new open-source SOTA text-to-video model, to generate videos.

Internals

Pyright integration

We now use pyright for type checking dstack Python code in CI. If you contribute to dstack, we recommend you configure your IDE to use pyright/pylance with standard type checking mode.

What's changed

Fix typing issues and add pyright to CI by @r4victor in #3011
[Internal] Update Ask AI integration ID by @olgenn in #3009
Make Configurator generic by @r4victor in #3013
Type check cli.commands by @r4victor in #3014
[Docs] Improve the docs regarding dstack init and repos to reflect the recent changes. by @peterschmidt85 in #3015
Respect fleet spec when provisioning on run apply by @r4victor in #3022
Consider elastic busy fleets for provisioning by @r4victor in #3024
Fix duplicate instance_num by @r4victor in #3025
Add declarative repo configuration by @un-def in #3023
Allow gpu.name as string in json schema by @r4victor in #3027
[Bug]: nebius.aio.service_error.RequestError: Request error DEADLINE_EXCEEDED: Deadline Exceeded #2962 by @peterschmidt85 in #3028
Fix DataCrunchCompute exception when terminating already removed instance by @r4victor in #3032
[DataCrunch] Ensure dstack is using fixed pricing #3033 by @peterschmidt85 in #3034
Document repos by @peterschmidt85 in #3026
Add Wan2.2 example by @r4victor in #3029
Automatically remove dangling tasks from shim by @jvstme in #3036
dstack offer fixes by @peterschmidt85 in #3038
Remove dstack init from help by @r4victor in #3039

Full changelog: 0.19.25...0.19.26

@r4victor

CLI

`dstack offer --group-by`

The dstack offer command can now display aggregated information about available offers. For example, to see what GPUs are available in different clouds, use --group-by gpu.

> dstack offer --group-by gpu

 #   GPU              SPOT             $/GPU           BACKENDS                             
 1   T4:16GB:1..8     spot, on-demand  0.1037..1.3797  gcp, aws                             
 2   L4:24GB:1..8     spot, on-demand  0.1829..2.1183  gcp, aws                             
 3   P100:16GB:1..4   spot, on-demand  0.2115..2.4043  gcp, oci                             
 4   V100:16GB:1..8   spot, on-demand  0.3152..4.234   gcp, aws, oci, lambda                
 5   A10G:22GB:1..8   spot, on-demand  0.3623..2.5845  aws                                  
 6   L40S:44GB:1..8   spot, on-demand  0.6392..4.7095  aws                                  
 7   A100:40GB:1..16  spot, on-demand  0.6441..4.0496  gcp, aws, oci, lambda                
 8   A10:24GB:1..4    on-demand        0.75..2         oci, lambda                          
 9   H100:80GB:1..8   spot, on-demand  1.079..15.7236  gcp, aws, lambda                     
 10  A100:80GB:1..8   spot, on-demand  1.2942..5.7077  gcp, aws, lambda

Refer to the docs for information about the available aggregations.

Deprecations

Local repos are now deprecated. If you need to deliver a local directory or file to a run, use files instead. If the run doesn't require a repo, use dstack apply --no-repo. Remote repos remain the recommended way to deliver Git repos to runs.

What's changed

Document Deployment-compatible migrations by @r4victor in #2987
[Bug]: Server Docker image fails because of Unable to locate package … by @peterschmidt85 in #2983
Only register service replicas after probes pass by @jvstme in #2986
[Changelog] Introducing service probes by @peterschmidt85 in #2988
Deprecate local repos by @un-def in #2984
Support elastic fleets by @r4victor in #2967
fix typo config.yml.md by @jspablo in #2991
Check if kapa.ai can also be integrated into dstack Sky #296 by @olgenn in #2990
Typo in URLs by @mashcroft3 in #2995
[shim] Fix DCGMWrapperInterface nil check (bis) by @un-def in #3001
The logs section is too short in the UI by @olgenn in #2989
[Feature]: Allow dstack offer to aggregate GPU information by @peterschmidt85 in #2992
[Internal]: CI refactoring by @jvstme in #3006
Update examples by @un-def in #3007
Minor CLI fixes by @peterschmidt85 in #3008

New Contributors

@mashcroft3 made their first contribution in #2995

Full Changelog: 0.19.24...0.19.25

@r4victor

Migration guide

Warning

This update requires stopping all dstack server replicas before deploying, due to database schema changes.
Make sure no replicas from the previous version and the new version run at the same time.

What's changed

[Internal] Replace enums with strings in the DB, JobSubmission.termination_reason, and Run.termination_reason by @r4victor in #2949
[Internal] Fix macOS build for shim by @un-def in #2958
[Bug] Increase the secrets max character length by @james-boydell in #2971
[Internal] Introduce InstanceAvailability.NO_BALANCE (for external integrations) by @peterschmidt85 in #2975
[Bug]: Cannot manage secrets in UI as project admin by @olgenn in #2972
[Bug] Fix DCGMWrapperInterface nil check in shim by @un-def in #2980

Full changelog: 0.19.23...0.19.24

@jvstme

Major bug-fixes

This release resolves an issue introduced in 0.19.22 that caused instance provisioning to fail consistently for certain instance types.

Backends

Nebius

The nebius backend now supports spot instances and the NVIDIA B200 GPU.

> dstack offer -b nebius --spot                     

 #  BACKEND               RESOURCES                                          PRICE   
 1  nebius (eu-north1)    cpu=16 mem=200GB disk=100GB H100:80GB:1 (spot)     $1.25   
 2  nebius (eu-north1)    cpu=16 mem=200GB disk=100GB H200:141GB:1 (spot)    $1.45   
 3  nebius (eu-west1)     cpu=16 mem=200GB disk=100GB H200:141GB:1 (spot)    $1.45   
 4  nebius (us-central1)  cpu=16 mem=200GB disk=100GB H200:141GB:1 (spot)    $1.45   
 5  nebius (eu-north1)    cpu=128 mem=1600GB disk=100GB H100:80GB:8 (spot)   $10     
 6  nebius (eu-north1)    cpu=128 mem=1600GB disk=100GB H200:141GB:8 (spot)  $11.6   
 7  nebius (eu-west1)     cpu=128 mem=1600GB disk=100GB H200:141GB:8 (spot)  $11.6   
 8  nebius (us-central1)  cpu=128 mem=1600GB disk=100GB H200:141GB:8 (spot)  $11.6   

> dstack offer -b nebius --gpu 8:b200

 #  BACKEND               RESOURCES                                   PRICE   
 1  nebius (us-central1)  cpu=160 mem=1792GB disk=100GB B200:180GB:8  $44

What's changed

Fix dstack-shim release build by @jvstme in #2964
[Nebius] Support spot instances and B200 by @peterschmidt85 in #2965

Full Changelog: 0.19.22...0.19.23

Releases: dstackai/dstack

0.19.32

Fleets

Nodes

Maximum number of nodes

Backends

Nebius

Tags

Credentials file

Hot Aisle

What's changed

New contributors

Contributors

Uh oh!

0.19.31

Kubernetes

GCP

A4 spot instances with B200 GPUs

CLI

dstack project is now faster

What's changed

Contributors

Uh oh!

0.19.30

Major changes

Major bug-fixes

Other changes

Contributors

Uh oh!

0.19.29

Fleets

User Interface

Grouping offers by backend

Breaking changes

What's changed

Contributors

Uh oh!

0.19.28

CLI

Argument Handling

What's Changed

Contributors

Uh oh!

0.19.27

Run configurations

Repo directory

Working directory

Fleet configuration

Nodes, retry, and target

Offers

Digital Ocean and AMD Developer Cloud

Security

What's changed

Contributors

Uh oh!

0.19.26

Repos

Fleets

Examples

Wan2.2

Internals

Pyright integration

What's changed

Contributors

Uh oh!

0.19.25

CLI

dstack offer --group-by

Deprecations

What's changed

New Contributors

Contributors

Uh oh!

0.19.24

Migration guide

What's changed

Contributors

Uh oh!

0.19.23

Major bug-fixes

`dstack project` is now faster

`dstack offer --group-by`