🚀 Overview

Next-gen vector database delivering lightning-fast performance at billion-vector scale.

📦 Table of Contents

🚀 Overview

Cosdata is a next-generation retrieval infrastructure engineered for AI-native applications that demand relevance beyond simple vector similarity. Built with immutability and version control at its core, Cosdata delivers exceptional performance for modern semantic search and retrieval workloads. Cosdata advances retrieval technology through a relevance-first architecture that combines multiple search modalities:

Multi-Modal Retrieval: Seamlessly integrate BM25 full-text search, HNSW dense vectors, SPLADE learned sparse embeddings, and metadata-rich sparse vectors in a unified platform
Context-Aware Capabilities: Leverage geofencing, hierarchical document organization, and explainable ranking that understands user intent and real-world complexity
Enterprise-Grade Architecture: Benefit from colocated storage, streaming ingestion, transactional versioning, and comprehensive security features
Relevance Optimization: Move beyond cosine similarity with sophisticated ranking algorithms that optimize for actual user satisfaction, not just mathematical proximity

Cosdata is designed to meet the demands of production AI applications, delivering 60-120% reduction in compute requirements while improving retrieval quality by 20-50% (NDCG@10).

⚡️ Getting Started

1. Install

Prerequisites

Linux: curl
macOS & Windows: Docker (v20.10+)

Quick Install (Linux 🐧)

Run this one‑liner to install Cosdata and all dependencies:

curl -sL https://cosdata.io/install.sh | bash

✅ Installs the latest Cosdata CLI

Install via Docker (macOS 🖥️ & Windows 💻)

Verify Docker is running
```
docker --version
```
Pull the latest Cosdata image
```
docker pull cosdataio/cosdata:latest
```

Run the container

docker run -it \
--name cosdata-server \
-p 8443:8443 \
-p 50051:50051 \
cosdataio/cosdata:latest

✅ The server will be available at http://localhost:8443.

2. Build from Source

Perfect for contributors and power users who want to customize or extend Cosdata.

Prerequisites

Git (v2.0+)
Rust (v1.81.0+) & Cargo
C++ compiler
- GCC ≥ 4.8 or Clang ≥ 3.4

Tip: On Ubuntu/Debian you can install everything with:
sudo apt update && sudo apt install -y git build-essential curl \
   clang lld rustc cargo

🚀 Build & Run

Clone the repo

git clone https://github.com/cosdata/cosdata.git
cd cosdata

Compile in release mode
```
cargo build --release
```

Start the server

./target/release/cosdata --admin-key YOUR_ADMIN_KEY

You should see logs like:

[[2025-02-21T02:30:29Z INFO  cosdata::web_server] starting HTTP server at http://127.0.0.1:8443
[2025-02-21T02:30:29Z INFO  actix_server::builder] starting 20 workers
[2025-02-21T02:30:29Z INFO  actix_server::server] Actix runtime found; starting in Actix runtime
[2025-02-21T02:30:29Z INFO  actix_server::server] starting service: "actix-web-service-127.0.0.1:8443"
[2025-02-21T02:30:29Z INFO  cosdata::grpc::server] gRPC server listening on [::1]:50051

3. Testing Your Installation

🧪 Quick Validation Cosdata Server with `test.py`

Use the test.py script in the tests/ directory to validate your Cosdata server setup. This script will:

Create a test collection and a Dense HNSW index.
Insert batches of random vectors in a single transaction.
Generate query vectors by perturbing ~10% of the inserted vectors.
Search the server for nearest neighbors using its HNSW index.
Verify results by comparing against a local brute‑force cosine distance search.

🔧 Prerequisites

Python 3.8+
The uv CLI for virtual‑env & dependency management
A running Cosdata server at http://127.0.0.1:8443

⚙️ Setup & Execution

Run the following from the tests/ directory:

Install dependencies
```
cd tests
uv sync
```

This will:

Create a Python virtual environment
Install packages listed in pyproject.toml

Run the test script
```
uv run test.py
```
Review the output
The script prints a summary, including:
- Number of vectors inserted
- Queries executed
- Pass/fail status for each comparison

Tip: If any test fails, check your server logs under ~/.cosdata/logs/ or review console output for errors.

🔍 Testing with Real‑World Datasets (`test-dataset.py`)

Use the test-dataset.py script to benchmark Cosdata against real‑world datasets:

Download or mount the dataset (e.g., SIFT, GloVe embeddings).
Index the dataset using your chosen index type (HNSW, IVF, etc.).
Query sample vectors and record accuracy & latency metrics.
Compare Cosdata's performance against baseline implementations.

TODO: Add download links, configuration flags, and step‑by‑step instructions for each dataset.

4. HTTPS Configuration (TLS)

By default, Cosdata runs over HTTP, but we strongly recommend enabling HTTPS in production.

1. Development Mode (HTTP)

If you just want to spin up the server quickly without TLS, edit your config.toml:

[server]
mode = "http"

⚠️ Warning: HTTP mode is not secure—only use this for local development or testing.

2. Enabling TLS (HTTPS)

To run Cosdata over HTTPS, you need:

TLS certificates (self‑signed OK for testing)
A valid config.toml pointing at your certs
Proper file permissions

a. Generate a Self‑Signed Certificate

Create a new RSA key and self‑signed cert (valid 1 year)

openssl req -newkey rsa:2048 -nodes -keyout private_key.pem -x509 -days 365 -out self_signed_certificate.crt

Convert the private key to PKCS#8 format

openssl pkcs8 -topk8 -inform PEM -outform PEM -in private_key.pem -out private_key_pkcs8.pem -nocrypt

b. Store & Secure Your Certificates

Set your cert directory (choose a secure path)
```
export SSL_CERT_DIR="/etc/ssl"
```

Create subdirectories

sudo mkdir -p $SSL_CERT_DIR/{certs,private}

Move certs into place

sudo mv self_signed_certificate.crt   $SSL_CERT_DIR/certs/cosdata.crt
sudo mv private_key_pkcs8.pem         $SSL_CERT_DIR/private/cosdata.key

Secure the private key

sudo groupadd ssl-cert            || true
sudo chgrp ssl-cert $SSL_CERT_DIR/private/cosdata.key
sudo chmod 640  $SSL_CERT_DIR/private/cosdata.key
sudo chmod 750  $SSL_CERT_DIR/private
sudo usermod -aG ssl-cert $USER   # you may need to log out/in or run `newgrp ssl-cert`

c. Configure Cosdata to Use TLS

In your config.toml, update the [server] section:

[server]
mode     = "https"
tls_cert = "/etc/ssl/certs/cosdata.crt"
tls_key  = "/etc/ssl/private/cosdata.key"

d. Restart Cosdata

If running directly:

./target/release/cosdata --admin-key YOUR_ADMIN_KEY

If using Docker, mount your cert directory:

docker run -it --rm \
  -v "/etc/ssl/certs:/etc/ssl/certs:ro" \
  -v "/etc/ssl/private:/etc/ssl/private:ro" \
  cosdataio/cosdata:latest \
  cosdata --admin-key YOUR_ADMIN_KEY

🔎 Verify HTTPS

Open your browser or run:

curl -kv https://localhost:8443/health

You should see a successful TLS handshake and a healthy status response.

🧩 Client SDKs

Cosdata provides an officially maintained Python SDK for seamless integration into your projects.

🐍 Python SDK

Install

pip install cosdata-sdk

Quickstart Example

from cosdata import Client

# Initialize the client with your server details
client = Client(
    host="http://127.0.0.1:8443",  # Default host
    username="admin",               # Default username
    password="admin",               # Default password
    verify=False                    # SSL verification
)

# Create a collection for storing 768-dimensional vectors
collection = client.create_collection(
    name="my_collection",
    dimension=768,                  # Vector dimension
    description="My vector collection"
)

# Create an index with custom parameters
index = collection.create_index(
    distance_metric="cosine",       # Default: cosine
    num_layers=10,                  # Default: 10
    max_cache_size=1000,           # Default: 1000
    ef_construction=128,           # Default: 128
    ef_search=64,                  # Default: 64
    neighbors_count=32,            # Default: 32
    level_0_neighbors_count=64     # Default: 64
)

# Generate and insert vectors
import numpy as np

def generate_random_vector(id: int, dimension: int) -> dict:
    values = np.random.uniform(-1, 1, dimension).tolist()
    return {
        "id": f"vec_{id}",
        "dense_values": values,
        "document_id": f"doc_{id//10}",  # Group vectors into documents
        "metadata": {  # Optional metadata
            "created_at": "2024-03-20",
            "category": "example"
        }
    }

# Generate and insert vectors
vectors = [generate_random_vector(i, 768) for i in range(100)]

# Add vectors using a transaction
with collection.transaction() as txn:
    # Single vector upsert
    txn.upsert_vector(vectors[0])
    # Batch upsert for remaining vectors
    txn.batch_upsert_vectors(vectors[1:])

# Search for similar vectors
results = collection.search.dense(
    query_vector=vectors[0]["dense_values"],  # Use first vector as query
    top_k=5,                                  # Number of nearest neighbors
    return_raw_text=True
)

Learn More

📦 Cosdata Python SDK Documentation: cosdata-sdk-python

🟢 Node.js SDK

Install

npm install cosdata-sdk

Quickstart Example

import { createClient } from 'cosdata-sdk';

// Initialize the client (all parameters are optional)
const client = createClient({
  host: 'http://127.0.0.1:8443',  // Default host
  username: 'admin',              // Default username
  password: 'test_key',           // Default password
  verifySSL: false                // SSL verification
});

// Create a collection
const collection = await client.createCollection({
  name: 'my_collection',
  dimension: 128,
  dense_vector: {
    enabled: true,
    dimension: 128,
    auto_create_index: false
  }
});

// Create an index
const index = await collection.createIndex({
  name: 'my_collection_dense_index',
  distance_metric: 'cosine',
  quantization_type: 'auto',
  sample_threshold: 100,
  num_layers: 16,
  max_cache_size: 1024,
  ef_construction: 128,
  ef_search: 64,
  neighbors_count: 10,
  level_0_neighbors_count: 20
});

// Generate some vectors
function generateRandomVector(dimension: number): number[] {
  return Array.from({ length: dimension }, () => Math.random());
}

const vectors = Array.from({ length: 100 }, (_, i) => ({
  id: `vec_${i}`,
  dense_values: generateRandomVector(128),
  document_id: `doc_${i}`
}));

// Add vectors using a transaction
const txn = collection.transaction();
await txn.batch_upsert_vectors(vectors);
await txn.commit();

// Search for similar vectors
const results = await collection.getSearch().dense({
  query_vector: generateRandomVector(128),
  top_k: 5,
  return_raw_text: true
});

Learn More

📦 GitHub: cosdata-sdk-node

✨ Features

Search Relevance

Hybrid Multi-Modal Search: Combine BM25 full-text search with dense vector similarity (HNSW) and SPLADE learned sparse representations to deliver highly relevant results that balance lexical matching, semantic understanding, and context
Metadata-Rich Retrieval: Search across hierarchical document structures with inherited metadata, enabling complex boolean queries, temporal filtering, and geospatial ranking without requiring embedding models
Explainable Results: Understand exactly why results were surfaced with transparent scoring that decomposes semantic similarity, metadata matches, geographic relevance, and hierarchical relationships

High performance

Blazing-Fast Indexing: Achieve up to 12× faster indexing than Elasticsearch on large datasets with optimized implementations for both full-text and vector workloads
Ultra-Low Latency: Power applications with sub-100ms response times—our BM25 implementation delivers up to 151× higher QPS than Elasticsearch on benchmark datasets, while dense vector search achieves 1758+ QPS on million-record datasets
Massive Throughput: Handle thousands of concurrent requests per second with an architecture designed for optimal performance under heavy loads, outperforming competitors by 42-146% on standard benchmarks

Customizable

Configurability: Gain precise control over your setup with manual configuration of all indexing and querying hyperparameters, enabling you to optimize performance, resource utilization and tailor results to your exact specifications.
Dense Vector indexing: Achieve efficient and precise indexing with our vector database's optimized HNSW (Hierarchical Navigable Small World) algorithm, designed to enhance search performance and accuracy for large-scale data sets.
Sparse vectors: Expertly designed to work seamlessly with SPLADE-generated sparse vectors, our solution offers superior performance compared to BM25 indices for more precise and meaningful insights.

Scalability

Unlock unbounded scalability with our vector database, engineered to grow alongside your data and query demands. Whether you're handling millions of records or scaling up to massive datasets, enjoy consistent, high-speed performance without compromise
Achieve predictable and efficient query performance with our vector database, engineered for near-linear scalability that ensures fast results, even as your data expands.

Efficient

Resource utilization: Efficiency is at the core of our vector database, where ingenious provably efficient data structures and algorithms ensure outstanding performance while providing increasingly relevant search results.
Scalar quantization: Configure finer quantization resolutions, including quarter-nary (2-bit) and octal (3-bit), for enhanced compression and improved recall trade-offs, giving you more control over data efficiency and performance.
Product quantization: A pioneering product quantization approach to not only compress data more effectively but also enhance recall performance beyond what scalar quantization offers, optimizing both data efficiency and retrieval recall.

Enterprise-grade

Colocated Storage: Eliminate architectural complexity by storing document chunks, embeddings, and metadata together—retrieve complete content in a single operation without external database calls
Versioning & Time Travel: Query historical data states with immutable, append-only architecture supporting transactional indexing, A/B testing, and full audit trails
Streaming Ingestion: Process real-time data feeds with immediate queryability while maintaining consistency guarantees and enterprise-grade durability Production-Ready Security: Deploy with end-to-end encryption, optional client-side encryption for zero-trust environments, and fine-grained RBAC for multi-tenant use cases

Easy to use

Auto-configuration of hyper-parameters: Achieve peak performance with our vector database, utilizing insights-driven auto-configuration of hyperparameters to automatically fine-tune your system for the best results, no manual adjustments needed.
Intuitive API: Elegantly crafted HTTP Restful APIs featuring "Transactions as a resource". Manage all functions of our vector database effortlessly with intuitive HTTP RESTful APIs.
Client SDKs in your favourite language: Access our vector database effortlessly with client SDKs available in multiple programming languages.

Manage Multi-modal data

Supports real-time querying and dynamic index updates, ensuring that new multi-modal data (text, images, audio, etc.) is immediately searchable without downtime or delays.

🙌 Contributing

We welcome contributions from the community! Whether it's fixing a bug, improving documentation, or building new features—every bit helps.

For full guidelines (coding standards, commit messages, CI checks), please see our CONTRIBUTING.md If you have any questions, feel free to open an issue or join the discussion on Discord. We can’t wait to collaborate with you!

Please read our CONTRIBUTING.md to get started.

🤝 Contacts & Community

Have questions, ideas, or want to contribute? We'd love to hear from you!

🔗 Discord: Chat, collaborate, and get support — Join now

📨 Email: Partnerships & business inquiries — [email protected]

🐛 Issues: Report bugs or suggest features — Open an issue

💡 Discussions: Share ideas and ask questions — Join Discussion

Let's collaborate and build the future of vector search—together! 💡

⭐️ Show Your Support

If Cosdata has empowered your projects, please consider giving us a star on GitHub! ⭐️

Your endorsement helps attract new contributors and fuels ongoing improvements.

Thank you for your support! 🙏

Name		Name	Last commit message	Last commit date
Latest commit History 1,073 Commits
.cargo		.cargo
.github		.github
install		install
org		org
proto		proto
src		src
tests		tests
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.typos.toml		.typos.toml
CLA.md		CLA.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.docker.md		README.docker.md
README.md		README.md
build.rs		build.rs
config.toml		config.toml
google40a566553f2a3151.html		google40a566553f2a3151.html
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml

License

cosdata/cosdata

Folders and files

Latest commit

History

Repository files navigation

📦 Table of Contents

🚀 Overview

⚡️ Getting Started

1. Install

Prerequisites

Quick Install (Linux 🐧)

Install via Docker (macOS 🖥️ & Windows 💻)

2. Build from Source

Prerequisites

🚀 Build & Run

3. Testing Your Installation

🧪 Quick Validation Cosdata Server with test.py

🔧 Prerequisites

⚙️ Setup & Execution

🔍 Testing with Real‑World Datasets (test-dataset.py)

4. HTTPS Configuration (TLS)

1. Development Mode (HTTP)

2. Enabling TLS (HTTPS)

a. Generate a Self‑Signed Certificate

b. Store & Secure Your Certificates

c. Configure Cosdata to Use TLS

d. Restart Cosdata

🔎 Verify HTTPS

🧩 Client SDKs

🐍 Python SDK

🟢 Node.js SDK

✨ Features

Search Relevance

High performance

Customizable

Scalability

Efficient

Enterprise-grade

Easy to use

Manage Multi-modal data

🙌 Contributing

🤝 Contacts & Community

⭐️ Show Your Support

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 22

Languages

🧪 Quick Validation Cosdata Server with `test.py`

🔍 Testing with Real‑World Datasets (`test-dataset.py`)

Packages