Blobnom

Blobnom is a high-performance, sharded blob storage server written in Rust. It implements the Redis protocol for client compatibility and uses SQLite as the backend for each shard, providing a simple yet robust solution for storing and retrieving binary large objects (blobs).

Features

Sharding: Distributes data across multiple SQLite databases (shards) for improved concurrency and scalability. The shard for a given key is determined by an FNV hash of the key.
Redis Protocol: Implements Redis protocol for client compatibility with the following commands:
- GET key: Retrieve a blob.
- SET key value: Store or replace a blob.
- DEL key: Delete a blob.
- EXISTS key: Check if a blob exists.
- HGET namespace key: Retrieve a blob from a namespace.
- HSET namespace key value: Store or replace a blob in a namespace.
- HDEL namespace key: Delete a blob from a namespace.
- HEXISTS namespace key: Check if a blob exists in a namespace.
Asynchronous Operations: Leverages Tokio and async Rust for non-blocking I/O, ensuring efficient handling of concurrent requests.
Configurable:
- Number of shards.
- Data directory for storing SQLite files.
- Optional storage compression (not yet implemented but planned via config).
- Optional output compression (not yet implemented but planned via config).
- Asynchronous write mode for improved response times.
- Write batching for improved database throughput.
SQLite Backend: Each shard uses its own SQLite database, simplifying deployment and management.
Redis Cluster Compatibility: Full Redis cluster protocol support with automatic node discovery, hash slot distribution, and client redirection for horizontal scaling.
Namespacing: Hash-based namespacing using HGET/HSET commands for organizing data into logical groups.
Metadata Tracking: Each blob includes metadata such as creation time, update time, expiration time, and version number.
Graceful Shutdown: (Implicitly handled by Tokio)

Getting Started

Prerequisites

Rust toolchain (latest stable recommended)
just (a command runner, optional but Justfile is provided)
For cross-compilation (e.g., Linux MUSL builds): cross

Building and Running

Clone the repository:
```
git clone <repository-url>
cd blobmom
```
Configuration: Create a config.toml file in the root of the project. See the Configuration section below for details. A minimal example:
```
data_dir = "blob_data"
num_shards = 4
# storage_compression = false # Optional
# output_compression = false  # Optional
```

Build:

Using Cargo:
```
cargo build
```
Using Just (for convenience):
```
just build
```

For a release build:

cargo build --release
# or
just build-release

Run:
- Using Cargo:
```
cargo run
```
- Using Just:
```
just run
```
- To run the release build:
```
cargo run --release
# or
just run-release
```
The server will start by default on 0.0.0.0:6379 (standard Redis port).

Cross-compilation (Example: Linux MUSL)

The Justfile includes targets for cross-compiling, for example, to a static Linux MUSL binary:

just build-linux

This will produce a binary in target/x86_64-unknown-linux-musl/release/blobnom.

Configuration

Blobnom is configured via a config.toml file located in the project root.

Key configuration options:

data_dir (String, required): The directory where SQLite database files for each shard will be stored.
num_shards (usize, required): The number of shards to distribute data across. Must be greater than 0.
storage_compression (bool, optional): If true, enables compression for data at rest. (Default: false if not specified)
output_compression (bool, optional): If true, enables compression for HTTP responses. (Default: false if not specified)
async_write (bool, optional): If true, enables asynchronous write operations where the server responds immediately after queueing the operation instead of waiting for database completion. Uses an inflight cache to prevent race conditions where GET requests might not find recently SET data. (Default: false if not specified)
batch_size (usize, optional): Maximum number of operations to batch together in a single database transaction. Set to 1 to disable batching. Higher values improve throughput but increase latency. (Default: 1)
batch_timeout_ms (u64, optional): Maximum time in milliseconds to wait for additional operations before processing a batch. Only relevant when batch_size > 1. (Default: 0)

Example config.toml:

data_dir = "/var/data/blobnom"
num_shards = 8
storage_compression = true
output_compression = false
async_write = true
batch_size = 50
batch_timeout_ms = 10

Redis Commands

Blobnom implements a subset of Redis commands for blob operations:

Basic Commands

SET key value
- Description: Stores or replaces a blob.
- Parameters:
  - key: The unique identifier for the blob.
  - value: The binary data of the blob.
- Responses:
  - +OK: Blob stored successfully (or queued in async mode).
  - -ERR internal error: If an error occurs during the operation.
GET key
- Description: Retrieves a blob.
- Parameters:
  - key: The unique identifier for the blob.
- Responses:
  - Bulk string with blob data if the key exists.
  - $-1 (null bulk string): If the blob with the given key does not exist.
  - -ERR database error: If an error occurs during the operation.
DEL key
- Description: Deletes a blob.
- Parameters:
  - key: The unique identifier for the blob.
- Responses:
  - :1: If the key existed and was deleted.
  - :0: If the key did not exist.
  - -ERR internal error: If an error occurs during the operation.
EXISTS key
- Description: Checks if a blob exists.
- Parameters:
  - key: The unique identifier for the blob.
- Responses:
  - :1: If the key exists.
  - :0: If the key does not exist.
  - -ERR database error: If an error occurs during the operation.

Namespaced Commands

Blobnom supports Redis-style hash operations for namespacing data. Each namespace creates its own isolated table, allowing you to organize your data into logical groups.

HSET namespace key value
- Description: Stores or replaces a blob in a namespace.
- Parameters:
  - namespace: The namespace identifier (creates table blobs_namespace).
  - key: The unique identifier for the blob within the namespace.
  - value: The binary data of the blob.
- Responses:
  - +OK: Blob stored successfully (or queued in async mode).
  - -ERR internal error: If an error occurs during the operation.
HGET namespace key
- Description: Retrieves a blob from a namespace.
- Parameters:
  - namespace: The namespace identifier.
  - key: The unique identifier for the blob within the namespace.
- Responses:
  - Bulk string with blob data if the key exists in the namespace.
  - $-1 (null bulk string): If the blob does not exist in the namespace.
  - -ERR database error: If an error occurs during the operation.
HDEL namespace key
- Description: Deletes a blob from a namespace.
- Parameters:
  - namespace: The namespace identifier.
  - key: The unique identifier for the blob within the namespace.
- Responses:
  - :1: If the key existed in the namespace and was deleted.
  - :0: If the key did not exist in the namespace.
  - -ERR internal error: If an error occurs during the operation.
HEXISTS namespace key
- Description: Checks if a blob exists in a namespace.
- Parameters:
  - namespace: The namespace identifier.
  - key: The unique identifier for the blob within the namespace.
- Responses:
  - :1: If the key exists in the namespace.
  - :0: If the key does not exist in the namespace.
  - -ERR database error: If an error occurs during the operation.

Using Redis Clients

You can use any Redis client to interact with Blobnom:

# Using redis-cli - Basic operations
redis-cli -p 6379 SET mykey "Hello, World!"
redis-cli -p 6379 GET mykey
redis-cli -p 6379 DEL mykey
redis-cli -p 6379 EXISTS mykey

# Using redis-cli - Namespaced operations
redis-cli -p 6379 HSET users:123 name "John Doe"
redis-cli -p 6379 HSET users:123 email "[email protected]"
redis-cli -p 6379 HGET users:123 name
redis-cli -p 6379 HEXISTS users:123 email
redis-cli -p 6379 HDEL users:123 email

# Using Redis clients in various languages
# Python: redis-py
# Node.js: ioredis or node-redis
# Go: go-redis
# etc.

Example Use Cases for Namespacing

User Data Storage

HSET user:12345 profile '{"name": "John", "age": 30}'
HSET user:12345 preferences '{"theme": "dark", "lang": "en"}'
HGET user:12345 profile

Session Management

HSET session:abc123 user_id "12345"
HSET session:abc123 expires_at "1703980800"
HEXISTS session:abc123 user_id

Configuration Storage

HSET config:app database_url "sqlite:app.db"
HSET config:app log_level "info"
HGET config:app database_url

Analytics Data

HSET metrics:daily:2024-01-01 page_views "1000"
HSET metrics:daily:2024-01-01 unique_users "250"
HGET metrics:daily:2024-01-01 page_views

Project Structure

src/main.rs: Entry point, sets up tracing, configuration, Redis server, and spawns shard writer tasks.
src/config.rs: Handles loading and validation of the config.toml file.
src/app_state.rs: Defines the AppState structure and logic for determining the shard for a key.
src/server.rs: Implements the Redis protocol server and command handlers.
src/redis/: Redis protocol implementation module.
- src/redis/protocol.rs: Robust RESP protocol parser using nom, with comprehensive error handling.
- src/redis/integration_tests.rs: Extensive integration tests for Redis protocol parsing.
src/shard_manager.rs: Defines the ShardWriteOperation enum and the shard_writer_task responsible for handling write operations (Set, Delete) for each shard via a message queue.
config.toml: Configuration file (user-created).
Justfile: Contains just commands for common development tasks.
Cargo.toml: Rust package manifest.

Key Dependencies

Tokio: Asynchronous runtime for Rust.
SQLx: Asynchronous SQL toolkit for Rust, used here with SQLite.
Nom: High-performance parser combinator library for RESP protocol parsing.
Moka: High-performance concurrent caching library for inflight request tracking in async mode.
Serde: Framework for serializing and deserializing Rust data structures.
Config: Layered configuration system for Rust applications.
Miette: Fancy diagnostic reporting library.
Thiserror: Better error handling with derive macros.
Fnv: For FNV hashing to determine shard index.
Tracing: Application-level tracing framework.
Chrono: Date and time library for handling timestamps.

Performance Features

Write Batching

Blobnom supports batching multiple write operations into single database transactions for improved throughput:

Benefits: 2-10x write throughput improvement for high-volume workloads
Configuration: Set batch_size > 1 to enable batching
Timeout: Use batch_timeout_ms to prevent excessive latency
Tradeoffs: Slightly increased latency per operation, better overall throughput

Asynchronous Writes

When async_write = true, the server responds immediately after queueing operations:

Benefits: Lower response times, better user experience
Race Condition Prevention: Uses Moka cache to store inflight writes, ensuring GET requests return correct data even before database writes complete
Tradeoffs: Operations may fail after server responds (check logs)
Use case: Write-heavy workloads where eventual consistency is acceptable

Recommended Configuration

For high-throughput write workloads:

async_write = true
batch_size = 50
batch_timeout_ms = 10

For low-latency, consistency-focused workloads:

async_write = false
batch_size = 1

Race Condition Handling

In async write mode (async_write = true), Blobnom uses an inflight cache to prevent race conditions:

The Problem

# Without inflight cache, this sequence could fail:
SET mykey "value"     # Returns OK immediately
GET mykey             # Might return NULL if DB write hasn't completed

The Solution

Inflight Cache: Stores pending writes using Moka cache
GET Operations: Check inflight cache first, then database
Cache Cleanup: Entries removed after successful database commit
Memory Efficiency: Cache has configurable capacity (default: 10,000 entries)

Cache Behavior

SET key value in async mode → stores in inflight_cache
GET key → checks inflight_cache first, then database
HSET ns key value in async mode → stores in inflight_hcache as ns:key
HGET ns key → checks inflight_hcache first, then database
DEL key in async mode → removes from inflight_cache immediately
Database commit success → removes entries from inflight caches

This ensures read-after-write consistency even in async mode.

Database Schema

The SQLite database schema for each shard includes the following tables:

Default Table

CREATE TABLE blobs (
    key TEXT PRIMARY KEY,
    data BLOB,
    created_at INTEGER NOT NULL,    -- Unix timestamp
    updated_at INTEGER NOT NULL,    -- Unix timestamp  
    expires_at INTEGER,             -- Unix timestamp, NULL if no expiration
    version INTEGER NOT NULL DEFAULT 0  -- Incremented on each update
);

Namespaced Tables

Each namespace automatically creates its own table with the naming pattern blobs_{namespace}:

CREATE TABLE blobs_users (
    key TEXT PRIMARY KEY,
    data BLOB,
    created_at INTEGER NOT NULL,    -- Unix timestamp
    updated_at INTEGER NOT NULL,    -- Unix timestamp  
    expires_at INTEGER,             -- Unix timestamp, NULL if no expiration
    version INTEGER NOT NULL DEFAULT 0  -- Incremented on each update
);

Metadata Features

Automatic Timestamps: created_at and updated_at are automatically managed
Versioning: Each blob update increments the version number starting from 0
Expiration: Blobs with expires_at set will be automatically filtered out from GET requests
Dynamic Table Creation: Namespaced tables are created automatically when first accessed
Table Caching: Each shard maintains a cache of known tables to avoid database queries

Namespace Performance Optimizations

Table Existence Caching: Each shard maintains a HashSet of known tables in memory
Dynamic Creation: Tables are created only when first accessed
Batched Operations: Namespaced operations are batched with regular operations
Sharding: Namespaced data uses the same key-based sharding as regular operations

Migration Compatibility

Existing GET/SET operations continue to work with the default blobs table:

GET key → uses blobs table
SET key value → uses blobs table
HGET namespace key → uses blobs_namespace table
HSET namespace key value → uses blobs_namespace table

This allows for gradual migration to namespaced storage without breaking existing applications.

Testing

Blobnom includes comprehensive test coverage:

Unit Tests: 25 tests covering RESP protocol parsing, serialization, and command validation
Integration Tests: 19 tests covering command handling, binary data, and protocol compliance
Total Coverage: 44 tests ensuring robust Redis protocol implementation

Run tests with:

cargo test

For specific test suites:

cargo test redis::protocol      # Protocol parser tests
cargo test redis::integration   # Integration tests

Redis Cluster Compatibility

Blobnom implements full Redis cluster protocol compatibility with automatic node discovery via gossip protocol. This allows Redis cluster-aware clients to seamlessly connect and perform operations across a distributed cluster of nodes.

Key Features

Hash Slot Distribution: Uses Redis-compatible CRC16 hash slots (16384 total) with slot range configuration
Automatic Node Discovery: Gossip protocol (chitchat) for automatic cluster membership management
MOVED Redirection: Automatic client redirection for keys on different nodes
Cluster Commands: Full support for CLUSTER NODES, CLUSTER INFO, CLUSTER SLOTS, etc.
Cross-Node Operations: Seamless key operations across all cluster nodes with automatic routing

Basic Cluster Configuration

[cluster]
enabled = true
node_id = "node-1"
seeds = ["127.0.0.1:7002", "127.0.0.1:7003"]
advertise_addr = "127.0.0.1:7001"
port = 7001
gossip_interval_ms = 1000

# Hash slot ranges for this node
[[cluster.slot_ranges]]
start = 0
end = 5460

Usage with Redis Clients

# Redis CLI with cluster support
valkey-cli -c -p 6381

# Check cluster status
valkey-cli -c -p 6381 CLUSTER NODES
valkey-cli -c -p 6381 CLUSTER INFO

# Test cross-node operations
valkey-cli -c -p 6381 SET mykey "myvalue"
valkey-cli -c -p 6382 GET mykey  # Automatic redirection

For detailed setup instructions, see CLUSTERING.md.

Current Limitations

No replication support (no master-slave configuration)
No automatic failover (manual intervention required for node failures)
Static slot assignment (no online resharding or slot migration)

Future Enhancements (Ideas)

Full chitchat gossip protocol integration
Master-slave replication for high availability
Online slot migration and rebalancing
Implement actual data compression for storage_compression and output_compression.
Add more robust error handling and logging.
Implement authentication and authorization.
Add metrics and monitoring.
Support for different hashing algorithms for sharding.
Streaming support for very large blobs.
Replication or durability options beyond single SQLite files.
Blob expiration cleanup background task.
Blob listing and search capabilities.
Additional Redis commands relevant to blob storage (EXPIRE, TTL).

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
dev		dev
examples		examples
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CLUSTERING.md		CLUSTERING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Justfile		Justfile
README.md		README.md
bench.js		bench.js
benchmark.sh		benchmark.sh
config.cluster.example.toml		config.cluster.example.toml
config.example.toml		config.example.toml

License

iamd3vil/blobasaur

Folders and files

Latest commit

History

Repository files navigation

Blobnom

Features

Getting Started

Prerequisites

Building and Running

Cross-compilation (Example: Linux MUSL)

Configuration

Redis Commands

Basic Commands

SET key value

GET key

DEL key

EXISTS key

Namespaced Commands

HSET namespace key value

HGET namespace key

HDEL namespace key

HEXISTS namespace key

Using Redis Clients

Example Use Cases for Namespacing

User Data Storage

Session Management

Configuration Storage

Analytics Data

Project Structure

Key Dependencies

Performance Features

Write Batching

Asynchronous Writes

Recommended Configuration

Race Condition Handling

The Problem

The Solution

Cache Behavior

Database Schema

Default Table

Namespaced Tables

Metadata Features

Namespace Performance Optimizations

Migration Compatibility

Testing

Redis Cluster Compatibility

Key Features

Basic Cluster Configuration

Usage with Redis Clients

Current Limitations

Future Enhancements (Ideas)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

`SET key value`

`GET key`

`DEL key`

`EXISTS key`

`HSET namespace key value`

`HGET namespace key`

`HDEL namespace key`

`HEXISTS namespace key`

Packages