Thanks to visit codestin.com
Credit goes to github.com

Skip to content

samyama-ai/samyama-graph

Repository files navigation

Samyama

The graph database that queried 1 billion edges for $2.50

Version Tests License Book


We loaded the entire PubMed corpus — every article published since 1966 — plus ClinicalTrials.gov, Reactome pathways, and DrugBank into one graph. Then we asked:

"What drugs are most tested in cancer clinical trials?"

MATCH (m:MeSHTerm)<-[:ANNOTATED_WITH]-(a:Article)
      -[:REFERENCED_IN]->(t:ClinicalTrial)-[:TESTS]->(i:Intervention)
WHERE m.name = 'Neoplasms'
RETURN i.name, count(DISTINCT t) AS trials
ORDER BY trials DESC LIMIT 5
Drug Trials
Placebo 521
Pembrolizumab 137
Carboplatin 106
Paclitaxel 106
Cyclophosphamide 98

5.2 seconds. One query. Four databases. 74 million nodes. 1 billion edges. A single machine.

See all 100 benchmark queries →


What is Samyama?

A graph-vector database written in Rust. OpenCypher queries, Redis protocol, vector search, graph algorithms — one binary, no JVM, no GC pauses.

# Install and run (30 seconds)
git clone https://github.com/samyama-ai/samyama-graph && cd samyama-graph
cargo build --release
./target/release/samyama    # RESP on :6379, HTTP on :8080
# Connect with any Redis client
redis-cli -p 6379
GRAPH.QUERY mydb "CREATE (a:Person {name: 'Alice'})-[:KNOWS]->(b:Person {name: 'Bob'})"
GRAPH.QUERY mydb "MATCH (a)-[:KNOWS]->(b) RETURN a.name, b.name"

Why Samyama?

If your data has relationships, you need a graph database. If your graph database can't handle a billion edges on a single machine, you need Samyama.

What How
74M nodes, 1B edges Loaded PubMed + ClinicalTrials.gov + Reactome + DrugBank on one r6a.8xlarge ($2.50 spot)
96/100 queries pass Point lookups, multi-hop traversals, cross-KG aggregations — all verified
Parallel everything Rayon: PageRank 3.1x, LCC 9.1x, Triangle Count 6x. Parallel scan, filter, compaction
975 QPS concurrent 16-client read workload, p99 < 25ms, zero errors across 67K queries
LDBC certified SNB Interactive 21/21, FinBench 40/40, Graphalytics 12/12

The 30-Second Tour

Cypher queries — ~90% OpenCypher. MATCH, CREATE, MERGE, aggregations, path finding, 30+ functions.

MATCH (a:Person)-[:KNOWS*1..3]->(b:Person)
WHERE a.name = 'Alice'
RETURN b.name, length(shortestPath(a, b))

Graph algorithms — PageRank, WCC, SCC, BFS, Dijkstra, LCC, CDLP, Triangle Count. All rayon-parallelized.

CALL pagerank('social') YIELD nodeId, score
RETURN nodeId, score ORDER BY score DESC LIMIT 10

Vector search — HNSW indexing for semantic search and Graph RAG.

CREATE VECTOR INDEX ON :Paper(embedding) OPTIONS {dimensions: 384, similarity: 'cosine'}
CALL vector.search('Paper', 'embedding', [0.1, 0.2, ...], 10) YIELD node, score

Natural language — Ask questions in English. The LLM translates to Cypher.

NLQ "Who are Alice's friends of friends that work at Google?"
→ MATCH (a:Person {name:'Alice'})-[:KNOWS]->()-[:KNOWS]->(fof)-[:WORKS_AT]->(c:Company {name:'Google'}) RETURN fof.name

AI agents — Auto-generated MCP servers from your graph schema.

pip install samyama[mcp]
samyama-mcp-serve --demo cricket    # Instant AI agent tools for any graph

Benchmarks

Scale: 74M Nodes, 1 Billion Edges

KG Source Nodes Edges
PubMed/MEDLINE NLM 66.2M 1.04B
Clinical Trials ClinicalTrials.gov 7.8M 27M
Pathways Reactome 119K 835K
Drug Interactions DrugBank + ChEMBL + SIDER 245K 388K

Loaded in 31 minutes from snapshots. 96 of 100 queries return real data across all four KGs. Full results →

Cross-KG Query Highlights

Query Time Result
Cancer → Trial interventions 5.2s Pembrolizumab #1 (137 trials)
Diabetes → Trial interventions 2.4s Metformin #1 (70 trials)
Metformin → Trial adverse events 2.1s Diarrhoea (185 trials) — known side effect confirmed
Cancer trial sites by country 3.8s US 4,062 · China 1,170 · France 827
NCI-funded → Trial drugs 19.4s Cyclophosphamide (517) · Radiation (362)
Aspirin articles → Trials 1.5s NCT00000491 "Aspirin MI study"

LDBC Compliance

Benchmark Pass Rate Dataset
SNB Interactive 21/21 (100%) SF1: 3.18M nodes, 17.26M edges
SNB BI 16/16 (100%) SF1
Graphalytics 12/12 (100%) XS reference graphs
FinBench 40/40 (100%) 7.7K nodes, 42.2K edges

Concurrent Performance

Workload 1 client 16 clients Scaling
Pure read 145 QPS 975 QPS 6.7x
Mixed 80/20 181 QPS 722 QPS 4.0x
Write-heavy 279 QPS 482 QPS 1.7x

Demo

Cricket KG — 36K nodes, 1.4M edges, live graph simulation

Samyama Graph Simulation

Click for full demo (1:56)


Examples

Domain Knowledge Graphs

Domain Command Nodes Edges
Banking & Fraud cargo run --example banking_demo Fraud patterns, money laundering, OFAC
Clinical Trials cargo run --example clinical_trials_demo Patient-trial matching, drug interactions
Supply Chain cargo run --example supply_chain_demo Disruption analysis, port optimization
Manufacturing cargo run --example smart_manufacturing_demo Digital twin, failure cascades
Social Network cargo run --example social_network_demo Influence, communities, recommendations
Enterprise SOC cargo run --example enterprise_soc_demo MITRE ATT&CK, attack paths, threat intel

Data Loaders

Dataset Command Scale
LDBC SNB SF1 cargo run --example ldbc_loader 3.2M nodes, 17.3M edges
Clinical Trials cargo run --release --example aact_loader 7.8M nodes, 27M edges
Drug Interactions cargo run --release --example druginteractions_loader 245K nodes, 388K edges
Cricket cargo run --release --example cricket_loader 36K nodes, 1.4M edges
FinBench cargo run --example finbench_loader 7.7K nodes, 42K edges

Related Repositories

samyama-graph is the engine. Per-domain KGs and companion projects live separately and can be loaded into it:


Architecture

samyama
├── graph/         Property graph model (Node, Edge, GraphStore, CSR adjacency)
├── query/         OpenCypher engine
│   ├── cypher.pest    PEG grammar
│   ├── executor/      Volcano iterator + WCO LeapFrog TrieJoin
│   └── planner.rs     Cost-based graph-native query planner
├── protocol/      RESP3 server (Redis-compatible, Tokio async)
├── persistence/   RocksDB + WAL + multi-tenancy
├── vector/        HNSW vector index
├── snapshot/      Portable .sgsnap v2 (CSR + ColumnStore)
├── raft/          Distributed consensus (openraft)
└── nlq/           Natural language → Cypher (OpenAI, Gemini, Ollama, Claude)

Companion crates:


Documentation

Resource Link
The Book graph.samyama.cloud/book
Biomedical Benchmark 100 queries, 96 pass
Cypher Compatibility docs/CYPHER_COMPATIBILITY.md
LDBC Results docs/ldbc/
Architecture Decisions docs/ADR/
API Spec api/openapi.yaml

Enterprise Edition

Everything above is open source (Apache 2.0). Samyama Enterprise adds:

  • GPU acceleration (wgpu + CUDA)
  • OpenTelemetry OTLP metrics
  • Prometheus + Grafana monitoring
  • Backup & disaster recovery
  • ADMIN commands + audit trail
  • Ed25519 signed license tokens

Contact us →


License

Apache License 2.0 — use it in production, contribute back if you'd like.

Samyama (Sanskrit: संयम) — the union of focused query, sustained analysis, and unified insight.

About

Graph-vector database that queried 1 billion edges for $2.50. Rust, OpenCypher, vector search, 14 graph algorithms. 74M nodes / 1B edges on a single machine.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages