OrcaS: Open Ready-to-Use Content Addressable Storage
OrcaS (Open Ready-to-Use Content Addressable Storage) is a lightweight, high-performance object storage system built with Content Addressable Storage (CAS) at its core. It provides enterprise-grade features like instant deduplication, multi-versioning, zero-knowledge encryption, and smart compression - all in a single binary that's ready to deploy.
- ๐ Open: Open source (MIT license), transparent, community-driven development
- โ Ready-to-Use: Content Addressable Storage ensures data integrity and automatic deduplication, production-ready out of the box
- ๐ฏ Content Addressable Storage: Data is stored by content hash, enabling automatic deduplication and integrity verification
- โก Instant Upload (Deduplication): Upload files in seconds, not minutes - identical files are detected instantly without uploading
- ๐ Zero-Knowledge Encryption: Your data, your keys - end-to-end encryption with industry-standard algorithms
- ๐ฆ Production Ready: S3-compatible API, VFS mount support, and comprehensive documentation
- ๐ High Performance: Optimized for both small and large files with intelligent packaging and chunking
What it does: Upload identical files instantly without transferring data.
How it works:
- Calculates multiple checksums (XXH3, SHA-256) for each file
- Before uploading, checks if identical content already exists
- If found, creates a reference to existing data instead of uploading
- Result: Upload time drops from minutes to milliseconds for duplicate files
Use cases:
- Backup systems (same files across multiple backups)
- Version control systems (similar files across versions)
- Multi-user environments (shared files)
- CDN edge storage (cached content)
Benefits:
- ๐ 99%+ faster uploads for duplicate files
- ๐พ Massive storage savings - store 1 copy, reference it N times
- โก Bandwidth savings - no redundant data transfer
- ๐ Automatic integrity verification - content hash ensures data correctness
What it does: Efficiently stores many small files together.
How it works:
- Groups small files (< 64KB) into packages
- Reduces metadata overhead and I/O operations
- Maintains individual file access while optimizing storage
Benefits:
- ๐ 10x+ performance improvement for small file operations
- ๐ฐ Reduced storage costs - less metadata overhead
- โก Faster operations - batch metadata writes
What it does: Splits large files into manageable chunks.
How it works:
- Automatically chunks files larger than configured threshold (default 10MB)
- Each chunk stored independently with its own checksum
- Enables parallel upload/download and efficient updates
Benefits:
- ๐ Parallel processing - upload/download chunks concurrently
- ๐ก๏ธ Resumable transfers - retry failed chunks independently
- โ๏ธ Efficient updates - only modified chunks need re-upload
- ๐ Better resource utilization - process large files efficiently
What it does: Automatically maintains file version history.
How it works:
- Each file modification creates a new version
- Old versions preserved automatically
- Configurable retention policies
- Space-efficient through content deduplication
Benefits:
- ๐ Point-in-time recovery - restore any previous version
- ๐ก๏ธ Data protection - accidental deletions are recoverable
- ๐ Audit trail - track all changes over time
- ๐พ Space efficient - unchanged data shared across versions
What it does: End-to-end encryption where only you hold the keys.
How it works:
- AES-256 encryption (industry standard)
- Encryption keys never leave your control
- Optional per-bucket encryption keys
- Transparent encryption/decryption
Benefits:
- ๐ Maximum security - even storage admins can't read your data
- โ Compliance ready - meets strict security requirements
- ๐ก๏ธ Data privacy - your data, your control
- ๐ International standards - AES-256 encryption
What it does: Automatically compresses data to save space.
How it works:
- Configurable compression algorithms (zstd, gzip, etc.)
- Compression applied before encryption
- Automatic detection of already-compressed data
- Per-bucket compression settings
Benefits:
- ๐พ Storage savings - typically 30-70% reduction
- โก Bandwidth savings - less data to transfer
- ๐ฏ Smart defaults - works out of the box
- โ๏ธ Configurable - adjust per your needs
OrcaS is built on Content Addressable Storage principles, where data is stored and retrieved by its content hash rather than location.
Key Benefits of CAS:
- Automatic Deduplication: Identical content stored once, referenced many times
- Integrity Verification: Content hash ensures data hasn't been corrupted
- Efficient Versioning: New versions only store changed content
- Simplified Backup: Same content = same hash = no re-upload needed
Storage Layout:
โโโ Metadata (SQLite)
โ โโโ Objects (files, directories)
โ โโโ DataInfo (content metadata)
โ โโโ Versions (version history)
โ โโโ References (deduplication)
โ
โโโ Data Blocks (File System)
โโโ <bucket_id>/
โโโ <hash_prefix>/
โโโ <hash>/
โโโ <dataID>_<chunk_number>
- Instant Upload: 99%+ faster for duplicate files (milliseconds vs minutes)
- Small Files: 10x+ performance improvement with packaging
- Large Files: Parallel chunk processing for optimal throughput
- Storage Efficiency: 30-70% space savings with compression + deduplication
- Concurrent Operations: Optimized for high concurrency
Performance Test Reports:
Contributions are welcome! Please feel free to submit a Pull Request.
MIT License - see LICENSE file for details.
- ๐ฏ Production Ready: Battle-tested, actively maintained
- ๐ High Performance: Optimized for real-world workloads
- ๐ Security First: Zero-knowledge encryption built-in
- ๐พ Storage Efficient: Automatic deduplication saves space and costs
- ๐ ๏ธ Easy to Use: S3-compatible API, VFS mount, comprehensive docs
- ๐ Innovative: Content Addressable Storage with instant deduplication
- ๐ Actively Developed: Regular updates and improvements
- ๐ค Open Source: MIT licensed, community-driven
Star us if you find this project useful! โญ