Distributed object storage for GPU workloads. Built on Rust on io_uring, OpenLake is a state of the art storage engine delivering 8x cost savings and million+ iops within 1ms.
Discord · Website · Comparison · Architecture · Quickstart
OpenLake is an object store for AI infrastructure. Training and inference clusters spend a large fraction of their wall clock time moving bytes from storage into GPU memory, most object stores put the host CPU, the page cache, and several userspace copies directly in that path. OpenLake is a high throughput, low latency storage engine that takes the opposite stance.
io_uring, thread per core. Built on thecompiocompletion based runtime. One runtime per core, pinned, no work stealing. The HTTP frontend and the storage engine run on the same thread, so a request never crosses a core boundary on the hot path.- No kernel involvement. GPUDirect Storage and RDMA, data moves from peer NIC into GPU VRAM zerocopy, eliminating host memory and the page cache. see Architecture.
- Erasure coded. SIMD Reed Solomon across striped EC. Reduced storage cost for replication, high throughput without the CPU cost of conventional EC.
- PacedRDMA. Novel congestion control algorithm for high throughput RDMA. Credit based memory management to absorb request bursts, minimizing tail latencies. (Supporting S3 over RDMA)
OpenLake sustains 225 MiB/s GET at sub 10 ms p50, 3x MinIO and 9x RustFS at c=512.
Stable Rust 1.91 or newer (pinned via rust-toolchain.toml). Linux gives you the io_uring driver; macOS builds and runs against kqueue for development.
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup default stableClone the repo and build the workspace in release mode.
git clone <repo-url> openlake && cd openlake
cargo build --release --workspaceThe openlake CLI drives a LocalFsBackend directly for diagnostics and microbenchmarks. Not an S3 client, but the quickest way to confirm the build works and see local throughput.
./target/release/openlake bench --n 100000 --size 4096 --concurrency 64Write one TOML file per node. The full schema lives at the top of crates/openlake_server/src/config.rs.
Start openlaked on each host with its own config, then talk to the cluster with any S3 client.
./target/release/openlaked --config node0.toml
aws --endpoint-url http://10.0.0.10:9000 s3 mb s3://demo
aws --endpoint-url http://10.0.0.10:9000 s3 cp ./checkpoint.safetensors s3://demo/
aws --endpoint-url http://10.0.0.10:9000 s3 ls s3://demo/We welcome and value any contributions and collaborations. Please check out Contributing to OpenLake for how to get involved.
