Codestin Search App

OrcaS: Open Ready-to-Use Content Addressable Storage

English | 中文

🚀 What is OrcaS?

OrcaS (Open Ready-to-Use Content Addressable Storage) is a lightweight, high-performance object storage system built with Content Addressable Storage (CAS) at its core. It provides enterprise-grade features like instant deduplication, multi-versioning, zero-knowledge encryption, and smart compression - all in a single binary that's ready to deploy.

Why OrcaS?

🌐 Open: Open source (MIT license), transparent, community-driven development
✅ Ready-to-Use: Content Addressable Storage ensures data integrity and automatic deduplication, production-ready out of the box
🎯 Content Addressable Storage: Data is stored by content hash, enabling automatic deduplication and integrity verification
⚡ Instant Upload (Deduplication): Upload files in seconds, not minutes - identical files are detected instantly without uploading
🔒 Zero-Knowledge Encryption: Your data, your keys - end-to-end encryption with industry-standard algorithms
📦 Production Ready: S3-compatible API, VFS mount support, and comprehensive documentation
🚀 High Performance: Optimized for both small and large files with intelligent packaging and chunking

✨ Key Features

⏱ Instant Upload (Object-level Deduplication)

What it does: Upload identical files instantly without transferring data.

How it works:

Calculates multiple checksums (XXH3, SHA-256) for each file
Before uploading, checks if identical content already exists
If found, creates a reference to existing data instead of uploading
Result: Upload time drops from minutes to milliseconds for duplicate files

Use cases:

Backup systems (same files across multiple backups)
Version control systems (similar files across versions)
Multi-user environments (shared files)
CDN edge storage (cached content)

Benefits:

🚀 99%+ faster uploads for duplicate files
💾 Massive storage savings - store 1 copy, reference it N times
⚡ Bandwidth savings - no redundant data transfer
🔍 Automatic integrity verification - content hash ensures data correctness

📦 Small Object Packaging

What it does: Efficiently stores many small files together.

How it works:

Groups small files (< 64KB) into packages
Reduces metadata overhead and I/O operations
Maintains individual file access while optimizing storage

Benefits:

📈 10x+ performance improvement for small file operations
💰 Reduced storage costs - less metadata overhead
⚡ Faster operations - batch metadata writes

🔪 Large Object Chunking

What it does: Splits large files into manageable chunks.

How it works:

Automatically chunks files larger than configured threshold (default 10MB)
Each chunk stored independently with its own checksum
Enables parallel upload/download and efficient updates

Benefits:

🔄 Parallel processing - upload/download chunks concurrently
🛡️ Resumable transfers - retry failed chunks independently
✏️ Efficient updates - only modified chunks need re-upload
📊 Better resource utilization - process large files efficiently

🗂 Object Multi-versioning

What it does: Automatically maintains file version history.

How it works:

Each file modification creates a new version
Old versions preserved automatically
Configurable retention policies
Space-efficient through content deduplication

Benefits:

🔙 Point-in-time recovery - restore any previous version
🛡️ Data protection - accidental deletions are recoverable
📚 Audit trail - track all changes over time
💾 Space efficient - unchanged data shared across versions

🔐 Zero-Knowledge Encryption

What it does: End-to-end encryption where only you hold the keys.

How it works:

AES-256 encryption (industry standard)
Encryption keys never leave your control
Optional per-bucket encryption keys
Transparent encryption/decryption

Benefits:

🔒 Maximum security - even storage admins can't read your data
✅ Compliance ready - meets strict security requirements
🛡️ Data privacy - your data, your control
🌍 International standards - AES-256 encryption

🗜 Smart Compression

What it does: Automatically compresses data to save space.

How it works:

Configurable compression algorithms (zstd, gzip, etc.)
Compression applied before encryption
Automatic detection of already-compressed data
Per-bucket compression settings

Benefits:

💾 Storage savings - typically 30-70% reduction
⚡ Bandwidth savings - less data to transfer
🎯 Smart defaults - works out of the box
⚙️ Configurable - adjust per your needs

🏗️ Architecture & Design

Content Addressable Storage (CAS) Core

OrcaS is built on Content Addressable Storage principles, where data is stored and retrieved by its content hash rather than location.

Key Benefits of CAS:

Automatic Deduplication: Identical content stored once, referenced many times
Integrity Verification: Content hash ensures data hasn't been corrupted
Efficient Versioning: New versions only store changed content
Simplified Backup: Same content = same hash = no re-upload needed

System Architecture

Instant Upload Flow

Data Storage Structure

Storage Layout:
├── Metadata (SQLite)
│   ├── Objects (files, directories)
│   ├── DataInfo (content metadata)
│   ├── Versions (version history)
│   └── References (deduplication)
│
└── Data Blocks (File System)
    └── <bucket_id>/
        └── <hash_prefix>/
            └── <hash>/
                └── <dataID>_<chunk_number>

📊 Performance Highlights

Instant Upload: 99%+ faster for duplicate files (milliseconds vs minutes)
Small Files: 10x+ performance improvement with packaging
Large Files: Parallel chunk processing for optimal throughput
Storage Efficiency: 30-70% space savings with compression + deduplication
Concurrent Operations: Optimized for high concurrency

Performance Test Reports:

📚 Documentation

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

MIT License - see LICENSE file for details.

⭐ Why Star This Project?

🎯 Production Ready: Battle-tested, actively maintained
🚀 High Performance: Optimized for real-world workloads
🔒 Security First: Zero-knowledge encryption built-in
💾 Storage Efficient: Automatic deduplication saves space and costs
🛠️ Easy to Use: S3-compatible API, VFS mount, comprehensive docs
🌟 Innovative: Content Addressable Storage with instant deduplication
📈 Actively Developed: Regular updates and improvements
🤝 Open Source: MIT licensed, community-driven

Star us if you find this project useful! ⭐

Name		Name	Last commit message	Last commit date
Latest commit History 420 Commits
.semaphore		.semaphore
assets		assets
cmd		cmd
core		core
rpc		rpc
s3		s3
sdk		sdk
vfs		vfs
LICENSE		LICENSE
MOUNT_README.md		MOUNT_README.md
QUICK_START_MOUNT.md		QUICK_START_MOUNT.md
README.md		README.md
README.zh.md		README.zh.md
go.mod		go.mod
go.sum		go.sum
mount_vfs.sh		mount_vfs.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 What is OrcaS?

Why OrcaS?

✨ Key Features

⏱ Instant Upload (Object-level Deduplication)

📦 Small Object Packaging

🔪 Large Object Chunking

🗂 Object Multi-versioning

🔐 Zero-Knowledge Encryption

🗜 Smart Compression

🏗️ Architecture & Design

Content Addressable Storage (CAS) Core

System Architecture

Instant Upload Flow

Data Storage Structure

📊 Performance Highlights

📚 Documentation

🤝 Contributing

📄 License

⭐ Why Star This Project?

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

orcastor/orcas

Folders and files

Latest commit

History

Repository files navigation

🚀 What is OrcaS?

Why OrcaS?

✨ Key Features

⏱ Instant Upload (Object-level Deduplication)

📦 Small Object Packaging

🔪 Large Object Chunking

🗂 Object Multi-versioning

🔐 Zero-Knowledge Encryption

🗜 Smart Compression

🏗️ Architecture & Design

Content Addressable Storage (CAS) Core

System Architecture

Instant Upload Flow

Data Storage Structure

📊 Performance Highlights

📚 Documentation

🤝 Contributing

📄 License

⭐ Why Star This Project?

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages