Thanks to visit codestin.com
Credit goes to Github.com

Skip to content
/ orcas Public

๐Ÿ—„๏ธใ€ๅผ€ๆ”พๅผ€็ฎฑๅณ็”จๅ†…ๅฎนๅฏปๅ€ๅฏน่ฑกๅญ˜ๅ‚จใ€‘ๆ”ฏๆŒไธปๆตๆ“ไฝœ็ณป็ปŸๅ’Œๅป‰ไปทไฝŽๅŠŸ่€—่ฎพๅค‡ [OrcaS] Open Ready-to-use Content Addressable Storage - for popular OS & cheap and low power devices.

License

Notifications You must be signed in to change notification settings

orcastor/orcas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

OrcaS: Open Ready-to-Use Content Addressable Storage

๐Ÿš€ What is OrcaS?

OrcaS (Open Ready-to-Use Content Addressable Storage) is a lightweight, high-performance object storage system built with Content Addressable Storage (CAS) at its core. It provides enterprise-grade features like instant deduplication, multi-versioning, zero-knowledge encryption, and smart compression - all in a single binary that's ready to deploy.

Why OrcaS?

  • ๐ŸŒ Open: Open source (MIT license), transparent, community-driven development
  • โœ… Ready-to-Use: Content Addressable Storage ensures data integrity and automatic deduplication, production-ready out of the box
  • ๐ŸŽฏ Content Addressable Storage: Data is stored by content hash, enabling automatic deduplication and integrity verification
  • โšก Instant Upload (Deduplication): Upload files in seconds, not minutes - identical files are detected instantly without uploading
  • ๐Ÿ”’ Zero-Knowledge Encryption: Your data, your keys - end-to-end encryption with industry-standard algorithms
  • ๐Ÿ“ฆ Production Ready: S3-compatible API, VFS mount support, and comprehensive documentation
  • ๐Ÿš€ High Performance: Optimized for both small and large files with intelligent packaging and chunking

โœจ Key Features

โฑ Instant Upload (Object-level Deduplication)

What it does: Upload identical files instantly without transferring data.

How it works:

  • Calculates multiple checksums (XXH3, SHA-256) for each file
  • Before uploading, checks if identical content already exists
  • If found, creates a reference to existing data instead of uploading
  • Result: Upload time drops from minutes to milliseconds for duplicate files

Use cases:

  • Backup systems (same files across multiple backups)
  • Version control systems (similar files across versions)
  • Multi-user environments (shared files)
  • CDN edge storage (cached content)

Benefits:

  • ๐Ÿš€ 99%+ faster uploads for duplicate files
  • ๐Ÿ’พ Massive storage savings - store 1 copy, reference it N times
  • โšก Bandwidth savings - no redundant data transfer
  • ๐Ÿ” Automatic integrity verification - content hash ensures data correctness

Deduplication Benefits

๐Ÿ“ฆ Small Object Packaging

What it does: Efficiently stores many small files together.

How it works:

  • Groups small files (< 64KB) into packages
  • Reduces metadata overhead and I/O operations
  • Maintains individual file access while optimizing storage

Benefits:

  • ๐Ÿ“ˆ 10x+ performance improvement for small file operations
  • ๐Ÿ’ฐ Reduced storage costs - less metadata overhead
  • โšก Faster operations - batch metadata writes

๐Ÿ”ช Large Object Chunking

What it does: Splits large files into manageable chunks.

How it works:

  • Automatically chunks files larger than configured threshold (default 10MB)
  • Each chunk stored independently with its own checksum
  • Enables parallel upload/download and efficient updates

Benefits:

  • ๐Ÿ”„ Parallel processing - upload/download chunks concurrently
  • ๐Ÿ›ก๏ธ Resumable transfers - retry failed chunks independently
  • โœ๏ธ Efficient updates - only modified chunks need re-upload
  • ๐Ÿ“Š Better resource utilization - process large files efficiently

๐Ÿ—‚ Object Multi-versioning

What it does: Automatically maintains file version history.

How it works:

  • Each file modification creates a new version
  • Old versions preserved automatically
  • Configurable retention policies
  • Space-efficient through content deduplication

Benefits:

  • ๐Ÿ”™ Point-in-time recovery - restore any previous version
  • ๐Ÿ›ก๏ธ Data protection - accidental deletions are recoverable
  • ๐Ÿ“š Audit trail - track all changes over time
  • ๐Ÿ’พ Space efficient - unchanged data shared across versions

๐Ÿ” Zero-Knowledge Encryption

What it does: End-to-end encryption where only you hold the keys.

How it works:

  • AES-256 encryption (industry standard)
  • Encryption keys never leave your control
  • Optional per-bucket encryption keys
  • Transparent encryption/decryption

Benefits:

  • ๐Ÿ”’ Maximum security - even storage admins can't read your data
  • โœ… Compliance ready - meets strict security requirements
  • ๐Ÿ›ก๏ธ Data privacy - your data, your control
  • ๐ŸŒ International standards - AES-256 encryption

๐Ÿ—œ Smart Compression

What it does: Automatically compresses data to save space.

How it works:

  • Configurable compression algorithms (zstd, gzip, etc.)
  • Compression applied before encryption
  • Automatic detection of already-compressed data
  • Per-bucket compression settings

Benefits:

  • ๐Ÿ’พ Storage savings - typically 30-70% reduction
  • โšก Bandwidth savings - less data to transfer
  • ๐ŸŽฏ Smart defaults - works out of the box
  • โš™๏ธ Configurable - adjust per your needs

๐Ÿ—๏ธ Architecture & Design

Content Addressable Storage (CAS) Core

OrcaS is built on Content Addressable Storage principles, where data is stored and retrieved by its content hash rather than location.

Content Addressable Storage Architecture

Key Benefits of CAS:

  1. Automatic Deduplication: Identical content stored once, referenced many times
  2. Integrity Verification: Content hash ensures data hasn't been corrupted
  3. Efficient Versioning: New versions only store changed content
  4. Simplified Backup: Same content = same hash = no re-upload needed

System Architecture

System Architecture

Instant Upload Flow

Instant Upload Flow

Data Storage Structure

Storage Layout:
โ”œโ”€โ”€ Metadata (SQLite)
โ”‚   โ”œโ”€โ”€ Objects (files, directories)
โ”‚   โ”œโ”€โ”€ DataInfo (content metadata)
โ”‚   โ”œโ”€โ”€ Versions (version history)
โ”‚   โ””โ”€โ”€ References (deduplication)
โ”‚
โ””โ”€โ”€ Data Blocks (File System)
    โ””โ”€โ”€ <bucket_id>/
        โ””โ”€โ”€ <hash_prefix>/
            โ””โ”€โ”€ <hash>/
                โ””โ”€โ”€ <dataID>_<chunk_number>

๐Ÿ“Š Performance Highlights

  • Instant Upload: 99%+ faster for duplicate files (milliseconds vs minutes)
  • Small Files: 10x+ performance improvement with packaging
  • Large Files: Parallel chunk processing for optimal throughput
  • Storage Efficiency: 30-70% space savings with compression + deduplication
  • Concurrent Operations: Optimized for high concurrency

Performance Test Reports:

๐Ÿ“š Documentation

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

๐Ÿ“„ License

MIT License - see LICENSE file for details.

โญ Why Star This Project?

  • ๐ŸŽฏ Production Ready: Battle-tested, actively maintained
  • ๐Ÿš€ High Performance: Optimized for real-world workloads
  • ๐Ÿ”’ Security First: Zero-knowledge encryption built-in
  • ๐Ÿ’พ Storage Efficient: Automatic deduplication saves space and costs
  • ๐Ÿ› ๏ธ Easy to Use: S3-compatible API, VFS mount, comprehensive docs
  • ๐ŸŒŸ Innovative: Content Addressable Storage with instant deduplication
  • ๐Ÿ“ˆ Actively Developed: Regular updates and improvements
  • ๐Ÿค Open Source: MIT licensed, community-driven

Star us if you find this project useful! โญ


FOSSA Status

About

๐Ÿ—„๏ธใ€ๅผ€ๆ”พๅผ€็ฎฑๅณ็”จๅ†…ๅฎนๅฏปๅ€ๅฏน่ฑกๅญ˜ๅ‚จใ€‘ๆ”ฏๆŒไธปๆตๆ“ไฝœ็ณป็ปŸๅ’Œๅป‰ไปทไฝŽๅŠŸ่€—่ฎพๅค‡ [OrcaS] Open Ready-to-use Content Addressable Storage - for popular OS & cheap and low power devices.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •