A high-performance lock-free Multi-Producer Single-Consumer (MPSC) channel implementation in Zig, achieving 50+ billion messages per second on commodity hardware.
Throughput benchmark on AMD Ryzen 7 5700 (8 cores, 16 threads, 32MB L3 cache):
| Configuration | Throughput (msg/s) | Scaling |
|---|---|---|
| 1P1C | 8.6 B/s | 1.0x |
| 2P2C | 13.5 B/s | 1.6x |
| 4P4C | 30.8 B/s | 3.6x |
| 6P6C | 44.2 B/s | 5.1x |
| 8P8C | 54.3 B/s | 6.3x |
Measured with 500M messages, batch size 32K, ring size 64K slots.
RingMPSC uses a ring-decomposed architecture where each producer has a dedicated SPSC (Single-Producer Single-Consumer) ring buffer. This eliminates producer-producer contention entirely.
Producer 0 ──► [Ring 0] ──┐
Producer 1 ──► [Ring 1] ──┼──► Consumer (polls all rings)
Producer 2 ──► [Ring 2] ──┤
Producer N ──► [Ring N] ──┘
-
128-byte Cache Line Alignment: Head and tail pointers are separated by 128 bytes to prevent prefetcher-induced false sharing (Intel/AMD prefetchers may pull adjacent lines).
-
Cached Sequence Numbers: Producers cache the consumer's head position to minimize cross-core cache traffic. Cache refresh only occurs when the ring appears full.
-
Batch Operations: The
consumeBatchAPI processes all available items with a single atomic head update, amortizing synchronization overhead. -
Adaptive Backoff: Crossbeam-style exponential backoff (spin → yield → park) reduces contention without wasting CPU cycles.
-
Zero-Copy API: The
reserve/commitpattern allows producers to write directly into the ring buffer without intermediate copies.
┌──────────────────────────────────────────────────────────────────┐
│ Producer Hot (128B aligned) │
│ tail: AtomicU64 ← Producer writes, Consumer reads │
│ cached_head: u64 ← Producer-local cache │
├──────────────────────────────────────────────────────────────────┤
│ Consumer Hot (128B aligned) │
│ head: AtomicU64 ← Consumer writes, Producer reads │
│ cached_tail: u64 ← Consumer-local cache │
├──────────────────────────────────────────────────────────────────┤
│ Cold State (128B aligned) │
│ active, closed, metrics │
├──────────────────────────────────────────────────────────────────┤
│ Data Buffer (64B aligned) │
│ [CAPACITY]T │
└──────────────────────────────────────────────────────────────────┘
const ringmpsc = @import("src/channel.zig");
// Create channel with default config (64K slots, 16 max producers)
var channel = ringmpsc.Channel(u64, ringmpsc.default_config){};
// Register producer
const producer = try channel.register();
// Send (zero-copy)
if (producer.reserve(1)) |r| {
r.slice[0] = 42;
producer.commit(1);
}
// Receive (batch)
var sum: u64 = 0;
const Handler = struct {
sum: *u64,
pub fn process(self: @This(), item: *const u64) void {
self.sum.* += item.*;
}
};
_ = channel.consumeAll(Handler{ .sum = &sum });
// Receive (limited batch for real-world processing)
_ = channel.consumeAllUpTo(1000, Handler{ .sum = &sum });# Run all tests and benchmark
./verify.sh
# Build only
zig build -Doptimize=ReleaseFast
# Run benchmark with CPU isolation (recommended)
sudo taskset -c 0-15 nice -n -20 zig-out/bin/bench
# Note: Keep SMT (hyperthreading) ENABLED. The 8P8C config needs 16 threads.
# Producer-consumer pairs benefit from sharing L1/L2 cache on SMT siblings.
# Run unit tests
zig build testringmpsc/
├── README.md # This file
├── build.zig # Build configuration
├── verify.sh # One-click verification script
└── src/
├── channel.zig # Core implementation
├── bench_final.zig # Throughput benchmark
├── test_fifo.zig # FIFO ordering verification
├── test_chaos.zig # Race condition detection
└── test_determinism.zig # Deterministic execution test
RingMPSC guarantees the following properties (verified by included tests):
- Per-Producer FIFO: Messages from a single producer are received in order.
- No Data Loss: All sent messages are eventually received.
- Thread Safety: No data races under concurrent access.
- Determinism: Identical inputs produce identical outputs.
pub const Config = struct {
ring_bits: u6 = 16, // 2^16 = 64K slots per ring
max_producers: usize = 16,
enable_metrics: bool = false,
};
// Preset configurations
pub const low_latency_config = Config{ .ring_bits = 12 }; // 4K (L1-resident)
pub const high_throughput_config = Config{ .ring_bits = 18 }; // 256K slots- LMAX Disruptor - The original batch-consumption pattern
- rigtorp/SPSCQueue - High-performance C++ SPSC queue
- crossbeam-channel - Rust concurrent channels
- moodycamel::ConcurrentQueue - C++ lock-free queue
MIT