ROADMAP.md

Samyama Graph Database Roadmap

This document outlines the development journey of Samyama, from its inception as a property graph engine to its current state as a distributed, AI-native Graph Vector Database.

✅ Completed Phases

Phase 1: Core Property Graph Engine

Goal: Build the fundamental data structures for nodes, edges, and properties.

Features:
- In-memory GraphStore using HashMaps.
- Support for multiple labels and property types (String, Int, Float, Bool, etc.).
- Adjacency lists for O(1) traversal lookups.

Phase 2: Query Engine & RESP Protocol

Goal: Enable interaction via standard tools.

Features:
- OpenCypher Parser: MATCH, WHERE, RETURN, CREATE, ORDER BY, LIMIT.
- Volcano Executor: Iterator-based query execution pipeline.
- RESP Server: Compatibility with Redis clients (redis-cli, Python/JS drivers).

Phase 3: Persistence & Multi-Tenancy

Goal: Enterprise-grade durability and isolation.

Features:
- RocksDB Storage: Persistent storage with column families for Nodes/Edges/Indices.
- WAL (Write-Ahead Log): Crash recovery and durability.
- Multi-Tenancy: Logical namespace isolation with resource quotas.

Phase 4: High Availability (Raft)

Goal: Distributed consensus and failover.

Features:
- Raft Consensus: Leader election, log replication, and quorum safety via openraft.
- Cluster Management: Dynamic membership changes (add/remove nodes).

Phase 5: RDF & Semantic Web

Goal: Interoperability with knowledge graphs.

Features:
- Triple Store: RDF data model support.
- Serialization: Turtle, N-Triples, RDF/XML support.

Phase 6: Vector Search & AI Integration

Goal: Native AI support for RAG applications.

Features:
- Vector Type: Native Vec<f32> property support.
- HNSW Indexing: High-performance Approximate Nearest Neighbor search.
- Graph RAG: Hybrid queries combining vector similarity + graph traversal.
- Cypher: CALL db.index.vector.queryNodes(...).

Phase 7: Native Graph Algorithms

Goal: In-database analytics.

Features:
- PageRank: Node centrality scoring.
- BFS/Dijkstra: Shortest path algorithms.
- WCC: Community detection.
- GraphView: Optimized CSR-like projection for analytics speed.

Phase 8: Query Optimization

Goal: Solve performance bottlenecks.

Features:
- B-Tree Indices: O(log n) property lookups.
- Cost-Based Optimizer (CBO): Automatically selects indices over scans.
- Performance: Improved lookup speed by 5,800x (115k QPS).

Phase 9: Async Ingestion

Goal: Maximize write throughput.

Features:
- Decoupled Architecture: Writes are acked immediately; indexing happens in background.
- Performance: Restored ingestion to ~870k nodes/sec (async benchmark); synchronous ingestion benchmarks at ~230K–360K nodes/sec depending on workload.

Phase 10: Tenant Sharding

Goal: Horizontal scalability.

Features:
- Request Router: Distributes tenants across different Raft groups.
- Proxy Layer: Forwards requests to correct shards transparently.

Phase 11: Native Visualizer

Goal: Developer Experience.

Features:
- Embedded Web UI: Served directly from binary at port 8080.
- Force-Directed Graph: Interactive visualization.
- Query Workbench: Run Cypher directly in the browser.

Phase 12: "Auto-Embed" Pipelines (formerly Auto-RAG)

Goal: Native AI support for automatic data processing.

Features:
- Tenant-Level Config: Each tenant can have its own LLM provider and embedding policy.
- Externalized LLMs: Support for OpenAI, Ollama, and Gemini.
- Automatic Embedding: Background tasks automatically generate embeddings when text properties matching policies are updated.
- Native Integration: Built directly into the async indexing pipeline.

Phase 13: Natural Language Querying (NLQ)

Goal: Query the graph using plain English.

Features:
- Text-to-Cypher: LLM-powered translation of user questions into valid Cypher queries.
- Schema-Aware: Injects tenant-specific schema into prompts for accuracy.
- Safe Execution: Defaults to read-only queries to prevent data loss.
- Opt-In: Configurable per tenant.

Phase 14: Agentic Enrichment

Goal: Autonomous agents that maintain and enrich the graph.

Features:
- Event-Driven: Agents trigger on NodeCreated or PropertySet events based on policy.
- Tool Use: Agents can use tools (e.g., Web Search) to gather information.
- Autonomous Updates: Agents can write back to the graph to enrich properties.
- Mock Tooling: Initial support for mocked tools for testing and development.

🔮 Future Roadmap

1. Time-Travel / Temporal Queries ⏳

3. Graph-Level Sharding

Goal: Massive scale for single graphs.

Plan: Partition single large graphs across nodes using Min-Cut algorithms (Metis), enabling trillion-edge scale (complexity: High).

Detailed Backlog: For a comprehensive, prioritized list of all planned work (~100 items across 13 categories), see samyama-cloud/docs/BACKLOG.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Samyama Graph Database Roadmap

✅ Completed Phases

Phase 1: Core Property Graph Engine

Phase 2: Query Engine & RESP Protocol

Phase 3: Persistence & Multi-Tenancy

Phase 4: High Availability (Raft)

Phase 5: RDF & Semantic Web

Phase 6: Vector Search & AI Integration

Phase 7: Native Graph Algorithms

Phase 8: Query Optimization

Phase 9: Async Ingestion

Phase 10: Tenant Sharding

Phase 11: Native Visualizer

Phase 12: "Auto-Embed" Pipelines (formerly Auto-RAG)

Phase 13: Natural Language Querying (NLQ)

Phase 14: Agentic Enrichment

🔮 Future Roadmap

1. Time-Travel / Temporal Queries ⏳

3. Graph-Level Sharding

FilesExpand file tree

ROADMAP.md

Latest commit

History

ROADMAP.md

File metadata and controls

Samyama Graph Database Roadmap

✅ Completed Phases

Phase 1: Core Property Graph Engine

Phase 2: Query Engine & RESP Protocol

Phase 3: Persistence & Multi-Tenancy

Phase 4: High Availability (Raft)

Phase 5: RDF & Semantic Web

Phase 6: Vector Search & AI Integration

Phase 7: Native Graph Algorithms

Phase 8: Query Optimization

Phase 9: Async Ingestion

Phase 10: Tenant Sharding

Phase 11: Native Visualizer

Phase 12: "Auto-Embed" Pipelines (formerly Auto-RAG)

Phase 13: Natural Language Querying (NLQ)

Phase 14: Agentic Enrichment

🔮 Future Roadmap

1. Time-Travel / Temporal Queries ⏳

3. Graph-Level Sharding