CHANGELOG.md

Changelog

All notable changes to Atlas are documented here. The format is based on Keep a Changelog, and the project adheres to Semantic Versioning.

For per-release deep dives — kernel-level wins, the engineering history behind specific subsystems — see the Atlas Spark Journey.

Unreleased

0.1.0 — 2026-05-06

Initial public release. Atlas is a pure-Rust LLM inference engine targeting NVIDIA GB10 (DGX Spark, SM121) with twelve hand-tuned (Hardware × Model × Quantization) targets.

Added

Pure-Rust runtime — no Python, no PyTorch — for hybrid Attention + SSM/GDN/Mamba-2 architectures with NVFP4 / FP8 / BF16 quantization.
35 hyperoptimized CUDA kernels per target, compiled to PTX and embedded in the binary at build time. Multi-model image dispatches the right kernel set at startup from config.json.
OpenAI- and Anthropic-compatible HTTP API (/v1/chat/completions, /v1/responses, /v1/messages, /v1/models, /v1/conversations, /tokenize, /detokenize, /health, /metrics).
Tool calling with grammar-constrained decoding (Hermes, Qwen3-Coder, Mistral, MiniMax-XML formats).
MTP speculative decoding (K=2 pipelined verify), self-speculative layer-skipping, and N-gram speculative decoding.
Prefix caching: radix-tree (RadixAttention) + SSM snapshot cache (Marconi-style). 10× warm-cache TTFT reduction.
KV cache dtypes: BF16, FP8, NVFP4, turbo3, turbo4. Optional per-layer high-precision overlay (--kv-high-precision-layers).
Multi-GPU expert parallelism (EP=2 over RoCEv2) for models that exceed a single GB10's weight budget (122B-class, MiniMax M2.7).
Vision encoder (Qwen3-VL, Qwen3.6 ViT).
High-speed NVMe KV swap (sliding-window, io_uring) for long-context decoding past the HBM cap.
Bearer-token authentication (--require-auth + --auth-tokens-file), constant-time validated. Default bind is 127.0.0.1; --bind 0.0.0.0 warns when used.
Twelve supported (GB10, model, quant) targets across Qwen3.5 / Qwen3.6 / Qwen3-Next / Qwen3-VL / Gemma-4 / Mistral-Small-4 / MiniMax-M2.7 / Nemotron-H families.
mdBook documentation at book/src/, rustdoc at target/doc/, Docker image avarok/atlas-gb10:latest.

Engineering notes

For the kernel-level perf history — long-context regression sweeps, the parking_lot migration, the libcuda + libnccl CI stubs, the multi-stage scheduler refactor — see docs/ATLAS_SPARK_JOURNEY.md and the book/ chapters under deep-dives/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changelog

Unreleased

0.1.0 — 2026-05-06

Added

Engineering notes

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

Unreleased

0.1.0 — 2026-05-06

Added

Engineering notes