Thanks to visit codestin.com
Credit goes to github.com

Skip to content

nubskr/walrus

Repository files navigation

walrus
Walrus: A high performance storage engine in Rust

Crates.io Documentation CI License: MIT

Features

  • High Performance: Optimized for concurrent writes and reads
  • Topic-based Organization: Separate read/write streams per topic
  • Configurable Consistency: Choose between strict and relaxed consistency models
  • Batched I/O: Atomic batch append and capped batch read APIs with io_uring acceleration on Linux
  • Dual Storage Backends: FD backend with pread/pwrite (default) or mmap backend
  • Persistent Read Offsets: Read positions survive process restarts
  • Coordination-free Deletion: Atomic file cleanup without blocking operations
  • Comprehensive Benchmarking: Built-in performance testing suite

Benchmarks

Run the supplied load tests straight from the repo:

make bench-writes    # sustained write throughput
make bench-reads     # write phase + read phase
make bench-scaling   # threads vs throughput sweep

Each target honours the environment variables documented in Makefile. Tweak things like FSYNC, THREADS, or WALRUS_DURATION to explore other scenarios.

Quick Start

Add Walrus to your Cargo.toml:

[dependencies]
walrus-rust = "0.2.0"

Basic Usage

use walrus_rust::{Walrus, ReadConsistency};

// Create a new WAL instance with default settings
let wal = Walrus::new()?;

// Write data to a topic
let data = b"Hello, Walrus!";
wal.append_for_topic("my-topic", data)?;

// Read data from the topic
if let Some(entry) = wal.read_next("my-topic", true)? {
    println!("Read: {:?}", String::from_utf8_lossy(&entry.data));
}

To peek without consuming an entry, call read_next("my-topic", false); the cursor only advances when you pass true.

Advanced Configuration

use walrus_rust::{Walrus, ReadConsistency, FsyncSchedule, enable_fd_backend};

// Configure with custom consistency and fsync behavior
let wal = Walrus::with_consistency_and_schedule(
    ReadConsistency::AtLeastOnce { persist_every: 1000 },
    FsyncSchedule::Milliseconds(500)
)?;

// Write and read operations work the same way
wal.append_for_topic("events", b"event data")?;

Configuration Basics

  • Read consistency: StrictlyAtOnce persists every checkpoint; AtLeastOnce { persist_every } favours throughput and tolerates replays.
  • Fsync schedule: choose SyncEach, Milliseconds(n), or NoFsync when constructing Walrus to balance durability vs latency.
  • Storage backend: FD backend (default) uses pread/pwrite syscalls and enables io_uring for batch operations on Linux; disable_fd_backend() switches to the mmap backend.
  • Namespacing & data dir: set WALRUS_INSTANCE_KEY or use the _for_key constructors to isolate workloads; WALRUS_DATA_DIR relocates the entire tree.
  • Noise control: WALRUS_QUIET=1 mutes debug logging from internal helpers.

Benchmark targets (make bench-writes, etc.) honour flags like FSYNC, THREADS, WALRUS_DURATION, and WALRUS_BATCH_SIZE, check the Makefile for the full list.

API Reference

Constructors

  • Walrus::new() -> io::Result<Self> – StrictlyAtOnce reads, 200ms fsync cadence.
  • Walrus::with_consistency(mode: ReadConsistency) -> io::Result<Self> – Pick the read checkpoint model.
  • Walrus::with_consistency_and_schedule(mode: ReadConsistency, schedule: FsyncSchedule) -> io::Result<Self> – Set both read consistency and fsync policy explicitly.
  • Walrus::new_for_key(key: &str) -> io::Result<Self> – Namespace files under wal_files/<sanitized-key>/.
  • Walrus::with_consistency_for_key(...) / with_consistency_and_schedule_for_key(...) – Combine per-key isolation with custom consistency/fsync choices.

Set WALRUS_INSTANCE_KEY=<key> to make the default constructors pick the same namespace without changing call-sites.

Topic Writes

  • append_for_topic(&self, topic: &str, data: &[u8]) -> io::Result<()> – Appends a single payload. Topics are created lazily. Returns ErrorKind::WouldBlock if a batch is currently running for the topic.
  • batch_append_for_topic(&self, topic: &str, batch: &[&[u8]]) -> io::Result<()> – Writes up to 2 000 entries (~10 GB including metadata) atomically. On Linux with the fd backend enabled the batch is submitted via io_uring; other platforms fall back to sequential writes. Failures roll back offsets and release provisional blocks.

Topic Reads

  • read_next(&self, topic: &str, checkpoint: bool) -> io::Result<Option<Entry>> – Returns the next entry, advancing the persisted cursor when checkpoint is true. Passing false lets you peek without consuming the entry.
  • batch_read_for_topic(&self, topic: &str, max_bytes: usize, checkpoint: bool) -> io::Result<Vec<Entry>> – Streams entries in commit order until either max_bytes of payload or the 2 000-entry ceiling is reached (always yields at least one entry when data is available). Respects the same checkpoint semantics as read_next.

Types

pub struct Entry {
    pub data: Vec<u8>,
}

benchmarks

I benchmarked the latest version of walrus against single kafka broker(without replication, no networking overhead) and rocksdb's WAL

alt text this performance is with the legacy append_for_topic() endpoint which uses pwrite() syscall for each write operation, no io_uring batching is used for these benchmarks

System Avg Throughput (writes/s) Avg Bandwidth (MB/s) Max Throughput (writes/s) Max Bandwidth (MB/s)
Walrus 1,205,762 876.22 1,593,984 1,158.62
Kafka 1,112,120 808.33 1,424,073 1,035.74
RocksDB 432,821 314.53 1,000,000 726.53

for synced writes

alt text

System Avg Throughput (writes/s) Avg Bandwidth (MB/s) Max Throughput (writes/s) Max Bandwidth (MB/s)
RocksDB 5,222 3.79 10,486 7.63
Walrus 4,980 3.60 11,389 8.19
Kafka 4,921 3.57 11,224 8.34

Further Reading

Older deep dives live under docs/ (architecture, batch design notes, etc.) if you need more than the basics above.

Contributing

We welcome patches, check CONTRIBUTING.md for the workflow.

License

This project is licensed under the MIT License, see the LICENSE file for details.

Changelog

Version 0.2.0

  • New: Atomic batch write operations (batch_append_for_topic)
  • New: Batch read operations (batch_read_for_topic)
  • New: io_uring support for batch operations on Linux
  • New: Dual storage backends (FD backend with pread/pwrite, mmap backend)
  • New: Namespace isolation via _for_key constructors
  • New: FsyncSchedule::SyncEach and FsyncSchedule::NoFsync modes
  • New: Environment variables (WALRUS_INSTANCE_KEY, WALRUS_DATA_DIR, WALRUS_QUIET)
  • Improved: Comprehensive documentation with architecture and design docs
  • Improved: Enhanced benchmarking suite with batch operation benchmarks
  • Fixed: Tail read offset tracking in concurrent scenarios
  • Fixed: Reader persistence during concurrent batch writes

Version 0.1.0

  • Initial release
  • Core WAL functionality
  • Topic-based organization
  • Configurable consistency modes
  • Comprehensive benchmark suite
  • Memory-mapped I/O implementation
  • Persistent read offset tracking