A limit order book implementation achieving 50M+ operations per second with deterministic O(1) complexity for all core operations. Built in Zig for zero-cost abstractions and explicit memory control.
Operation Throughput Latency Complexity
────────────────────────────────────────────────────────
Insert 54.91M ops/sec 18ns O(1)
Cancel 53.11M ops/sec 18ns O(1)
Best Price 211.82M ops/sec 4ns O(1)
Match 119.05M ops/sec 8ns O(1) per fill
Measured on Apple M3 Max using isolated microbenchmarks (zig build bench-ladder-micro).
graph TB
subgraph "Domain Layer"
LB[LadderBook<br/>Array-based O(1)]
OB[OrderBook<br/>RB-tree baseline]
end
subgraph "Application Layer"
ME[MatchingEngine]
CMD[Command Processor]
EVT[Event Emitter]
end
subgraph "Adapter Layer"
HTTP[HTTP Server]
WS[WebSocket Server]
BIN[Binary Protocol]
JRN[Journal Persistence]
SNAP[Snapshot Store]
end
CMD --> ME
ME --> LB
ME --> EVT
EVT --> JRN
HTTP --> CMD
WS --> CMD
BIN --> CMD
The ladder algorithm replaces the traditional RB-tree with fixed-size arrays and hierarchical bitsets for constant-time operations.
graph LR
subgraph "Memory Layout (per side)"
PL[Price Levels<br/>1M slots × 32B<br/>32MB total]
OP[Order Pool<br/>100K orders<br/>Pre-allocated]
HM[HashMap<br/>OrderID → Index<br/>No rehashing]
BS[Bitset<br/>2-level hierarchy<br/>O(1) find]
end
PL --> OP
HM --> OP
BS --> PL
Price to Index Mapping:
index = (price - base_tick) / tick_size
Best Price Discovery:
- For bids:
best_bid = base_tick + (highest_set_bit × tick_size) - For asks:
best_ask = base_tick + (lowest_set_bit × tick_size)
Where set bits are found using CPU intrinsics:
@clz(count leading zeros) for highest bit@ctz(count trailing zeros) for lowest bit
Complexity Analysis:
| Operation | Traditional (RB-tree) | Ladder Implementation |
|---|---|---|
| Insert | O(log M) where M = unique prices | O(1) |
| Cancel | O(log M) + O(1) | O(1) |
| Best Price | O(1) with cached min/max | O(1) via bitset |
| Match | O(log M) + O(K) where K = orders | O(1) + O(K) |
classDiagram
class PriceLevel {
+u32 head_idx
+u32 tail_idx
+u64 aggregate_qty
+[12]u8 padding
----
32 bytes (cache-aligned)
}
class Order {
+u64 id
+u64 quantity
+u64 filled
+u32 next_idx
+u32 prev_idx
+OrderType type
+TimeInForce tif
}
class BookSide {
+[1M]PriceLevel levels
+[100K]Order pool
+HashMap id_map
+Bitset occupancy
+u32 free_head
}
BookSide --> PriceLevel : contains array
BookSide --> Order : manages pool
PriceLevel --> Order : indexes into
flowchart TD
A[New Order] --> B[Calculate price index]
B --> C[Allocate from pool]
C --> D[Add to HashMap]
D --> E{Level empty?}
E -->|Yes| F[Set bitset bit]
E -->|No| G[Link to tail]
F --> H[Update level head/tail]
G --> H
H --> I[Update aggregate]
I --> J[Return success]
Pseudocode:
function insert(order):
idx = pool.allocate() // O(1) - pop from free list
pool[idx] = order // O(1) - array write
tick = price_to_tick(order.price) // O(1) - arithmetic
level = &levels[tick] // O(1) - array access
if level.is_empty():
occupancy.set(tick) // O(1) - bit operation
level.head = idx
else:
pool[level.tail].next = idx // O(1) - link update
pool[idx].prev = level.tail
level.tail = idx
level.aggregate_qty += order.qty // O(1) - arithmetic
id_map.put(order.id, idx) // O(1) - amortized
flowchart TD
A[Market Order] --> B[Find best price via bitset]
B --> C[Get level at price]
C --> D{Has liquidity?}
D -->|Yes| E[Match FIFO]
E --> F{Order filled?}
F -->|No| G[Find next price]
F -->|Yes| H[Complete]
G --> C
D -->|No| I[No liquidity]
Matching Loop:
function match_market(qty_requested):
while qty_requested > 0:
best_tick = occupancy.find_first() // O(1) via @ctz
if !best_tick:
break // No liquidity
level = &levels[best_tick]
idx = level.head
while idx != INVALID and qty_requested > 0:
order = &pool[idx]
match_qty = min(order.qty, qty_requested)
emit_trade(order.id, match_qty)
order.qty -= match_qty
qty_requested -= match_qty
if order.qty == 0:
next = order.next
remove_order(idx) // O(1) unlink
idx = next
- Zig 0.14.0 or later
- No external dependencies
# Build optimized binary
zig build -Doptimize=ReleaseFast
# Run all tests (unit + parity + integration)
zig build test
# Run benchmarks
zig build bench-ladder-micro # Isolated operation benchmarks
zig build bench-compare # Ladder vs RB-tree comparison
zig build bench-ladder # Full workflow benchmark- Unit Tests - Verify individual operations
- Parity Tests - Ensure Ladder and RB-tree produce identical results
- Invariant Tests - Validate state consistency:
- Aggregate quantity = Σ(individual orders)
- Bitset occupancy ⟷ level state
- FIFO ordering maintained
- Performance Tests - Measure throughput and latency
const std = @import("std");
const MatchingEngine = @import("matching_engine");
// Initialize engine
var engine = try MatchingEngine.init(allocator, .{
.n_ticks = 1_000_000, // Price range: 1M ticks
.max_orders = 100_000, // Order capacity
.tick_size = 1, // Minimum price increment (cents)
});
defer engine.deinit();
// Insert limit order
const order_id = try engine.insertLimit(.{
.id = unique_id,
.side = .buy,
.price = 45000, // $450.00 with tick_size=1
.quantity = 100,
.type = .limit,
.time_in_force = .good_till_cancel,
});
// Cancel order
engine.cancel(order_id, .buy);
// Match market order
var events = std.ArrayList(Event).init(allocator);
defer events.deinit();
try engine.matchMarket(.{
.side = .sell,
.quantity = 50,
.events_out = &events,
});
// Process resulting events
for (events.items) |event| {
switch (event) {
.trade => |t| processTrade(t),
.level_update => |u| updateMarketData(u),
.order_accepted => |a| confirmOrder(a),
}
}pub const DomainEvent = union(enum) {
order_accepted: struct {
id: u64,
side: Side,
price: u64,
quantity: u64,
timestamp: u64,
},
order_rejected: struct {
id: u64,
reason: RejectReason,
},
order_canceled: struct {
id: u64,
remaining_qty: u64,
},
trade: struct {
maker_id: u64,
taker_id: u64,
price: u64,
quantity: u64,
maker_filled: bool,
taker_filled: bool,
timestamp: u64,
},
level_update: struct {
side: Side,
price: u64,
new_quantity: u64,
},
};Traditional order books use balanced trees (RB-tree, AVL) for price levels:
- Pros: Dynamic range, memory efficient for sparse books
- Cons: O(log M) operations, poor cache locality, rebalancing overhead
The ladder approach uses fixed arrays:
- Pros: O(1) operations, excellent cache locality, no allocations
- Cons: Fixed memory overhead, limited price range
For active markets, the ladder's performance advantage (6-7x based on benchmarks) outweighs the memory cost (64MB).
Memory usage: 64MB per symbol
- Price levels: 32MB (1M × 32 bytes)
- Order pool: ~20MB (100K orders)
- HashMap + bitset: ~12MB
Performance gain: 6-7x throughput
- RB-tree: ~8M ops/sec → Ladder: 54M ops/sec
- Worth it for active symbols
- Consider hybrid approach for long-tail symbols
Dynamic allocation introduces:
- Unpredictable latency spikes (malloc can block)
- Memory fragmentation
- Cache pollution
- HashMap rehashing (10x slowdown during resize)
Pre-allocation ensures:
- Deterministic latency (18ns consistently)
- No allocation in hot path
- Predictable memory layout
- Better cache utilization
// For equity markets (stocks)
.n_ticks = 100_000, // $0.01 to $1,000.00 range
.tick_size = 1, // 1 cent increments
// For crypto markets
.n_ticks = 10_000_000, // Wide range for volatility
.tick_size = 1, // $0.01 increments
// For FX markets
.n_ticks = 1_000_000, // 6 decimal places
.tick_size = 1, // 0.000001 increments- Memory: 64MB per symbol × number of symbols
- CPU: Single-threaded per symbol (no lock contention)
- Latency: Sub-microsecond matching, network is bottleneck
graph TB
subgraph "Gateway Layer"
GW1[Gateway 1]
GW2[Gateway 2]
GW3[Gateway N]
end
subgraph "Matching Layer"
subgraph "Server 1"
ME1[AAPL Engine]
ME2[GOOGL Engine]
end
subgraph "Server 2"
ME3[MSFT Engine]
ME4[AMZN Engine]
end
end
subgraph "Services"
BAL[Balance Service]
RISK[Risk Engine]
SETTLE[Settlement]
MD[Market Data]
end
GW1 --> ME1
GW2 --> ME2
ME1 --> BAL
ME2 --> RISK
ME3 --> SETTLE
ME4 --> MD
Required services to build around this engine:
- Authentication Service - Verify user identity
- Balance Service - Lock/unlock funds before/after trades
- Risk Management - Position limits, margin requirements
- Settlement Service - Clear and settle trades
- Market Data Distribution - WebSocket/FIX feed
- Audit/Compliance - Trade reporting, regulatory compliance
- Monitoring - Metrics, alerts, dashboards
Key metrics to track:
- Matching latency - P50, P95, P99, P99.9
- Throughput - Orders/sec, trades/sec
- Queue depth - Orders waiting at each price
- Memory usage - Pool utilization, HashMap load factor
- Event lag - Time from match to event emission
After evaluating multiple languages for this performance-critical application:
C++: Template complexity, hidden allocations in STL, undefined behavior pitfalls Rust: Borrow checker friction with intrusive data structures, async runtime overhead C: Manual memory management overhead, lack of modern abstractions Go: GC pauses unacceptable for sub-microsecond latency requirements
Zig provides:
- No hidden allocations - explicit memory control
- Comptime metaprogramming - zero-cost abstractions
- First-class error handling - no exceptions
- Direct hardware access - CPU intrinsics when needed
- Simple, readable code - maintainability matters
All benchmarks follow consistent methodology:
- Warmup - 100K operations to stabilize caches
- Measurement - 10M+ operations for statistical significance
- Isolation - Single operation type per benchmark
- Environment - Release build with optimizations enabled
- Verification - Results validated against reference implementation
While current performance exceeds requirements, potential optimizations include:
- SIMD Aggregation - Vectorize quantity summation
- Prefetching - Explicit cache line prefetch hints
- Huge Pages - Reduce TLB misses for large arrays
- NUMA Awareness - Pin memory to local nodes
These remain unimplemented as the bottleneck is network I/O, not the matching engine.
Performance improvements welcome. Requirements:
- Benchmark demonstrating measurable improvement
- All existing tests pass
- No complexity increase without justification
- Clear documentation of trade-offs
MIT
Built with Zig for its performance and safety guarantees without sacrificing control.