This document outlines the development journey of Samyama, from its inception as a property graph engine to its current state as a distributed, AI-native Graph Vector Database.
Goal: Build the fundamental data structures for nodes, edges, and properties.
- Features:
- In-memory
GraphStoreusing HashMaps. - Support for multiple labels and property types (String, Int, Float, Bool, etc.).
- Adjacency lists for O(1) traversal lookups.
- In-memory
Goal: Enable interaction via standard tools.
- Features:
- OpenCypher Parser:
MATCH,WHERE,RETURN,CREATE,ORDER BY,LIMIT. - Volcano Executor: Iterator-based query execution pipeline.
- RESP Server: Compatibility with Redis clients (
redis-cli, Python/JS drivers).
- OpenCypher Parser:
Goal: Enterprise-grade durability and isolation.
- Features:
- RocksDB Storage: Persistent storage with column families for Nodes/Edges/Indices.
- WAL (Write-Ahead Log): Crash recovery and durability.
- Multi-Tenancy: Logical namespace isolation with resource quotas.
Goal: Distributed consensus and failover.
- Features:
- Raft Consensus: Leader election, log replication, and quorum safety via
openraft. - Cluster Management: Dynamic membership changes (add/remove nodes).
- Raft Consensus: Leader election, log replication, and quorum safety via
Goal: Interoperability with knowledge graphs.
- Features:
- Triple Store: RDF data model support.
- Serialization: Turtle, N-Triples, RDF/XML support.
Goal: Native AI support for RAG applications.
- Features:
- Vector Type: Native
Vec<f32>property support. - HNSW Indexing: High-performance Approximate Nearest Neighbor search.
- Graph RAG: Hybrid queries combining vector similarity + graph traversal.
- Cypher:
CALL db.index.vector.queryNodes(...).
- Vector Type: Native
Goal: In-database analytics.
- Features:
- PageRank: Node centrality scoring.
- BFS/Dijkstra: Shortest path algorithms.
- WCC: Community detection.
- GraphView: Optimized CSR-like projection for analytics speed.
Goal: Solve performance bottlenecks.
- Features:
- B-Tree Indices: O(log n) property lookups.
- Cost-Based Optimizer (CBO): Automatically selects indices over scans.
- Performance: Improved lookup speed by 5,800x (115k QPS).
Goal: Maximize write throughput.
- Features:
- Decoupled Architecture: Writes are acked immediately; indexing happens in background.
- Performance: Restored ingestion to ~870k nodes/sec (async benchmark); synchronous ingestion benchmarks at ~230K–360K nodes/sec depending on workload.
Goal: Horizontal scalability.
- Features:
- Request Router: Distributes tenants across different Raft groups.
- Proxy Layer: Forwards requests to correct shards transparently.
Goal: Developer Experience.
- Features:
- Embedded Web UI: Served directly from binary at port 8080.
- Force-Directed Graph: Interactive visualization.
- Query Workbench: Run Cypher directly in the browser.
Goal: Native AI support for automatic data processing.
- Features:
- Tenant-Level Config: Each tenant can have its own LLM provider and embedding policy.
- Externalized LLMs: Support for OpenAI, Ollama, and Gemini.
- Automatic Embedding: Background tasks automatically generate embeddings when text properties matching policies are updated.
- Native Integration: Built directly into the async indexing pipeline.
Goal: Query the graph using plain English.
- Features:
- Text-to-Cypher: LLM-powered translation of user questions into valid Cypher queries.
- Schema-Aware: Injects tenant-specific schema into prompts for accuracy.
- Safe Execution: Defaults to read-only queries to prevent data loss.
- Opt-In: Configurable per tenant.
Goal: Autonomous agents that maintain and enrich the graph.
- Features:
- Event-Driven: Agents trigger on
NodeCreatedorPropertySetevents based on policy. - Tool Use: Agents can use tools (e.g., Web Search) to gather information.
- Autonomous Updates: Agents can write back to the graph to enrich properties.
- Mock Tooling: Initial support for mocked tools for testing and development.
- Event-Driven: Agents trigger on
Goal: Massive scale for single graphs.
- Plan: Partition single large graphs across nodes using Min-Cut algorithms (Metis), enabling trillion-edge scale (complexity: High).
Detailed Backlog: For a comprehensive, prioritized list of all planned work (~100 items across 13 categories), see
samyama-cloud/docs/BACKLOG.md.