Production-Ready Knowledge Graph Question Answering with Hybrid Retrieval
π Quick Start β’ π Documentation β’ ποΈ Architecture β’ π€ Contributing β’ π― Demo
SubgraphRAG+ is an advanced knowledge graph-powered question answering system that combines structured graph traversal with semantic vector search. It provides contextual answers with real-time visualizations through a production-ready REST API.
- π Hybrid Retrieval: Combines Neo4j graph traversal with FAISS vector search
- π Real-time Ingestion: Dynamic knowledge graph updates with validation
- π‘ Streaming API: Server-sent events with live citations and graph data
- π Interactive Visualization: D3.js-compatible graph data with relevance scoring
- π§ Multi-LLM Support: OpenAI, HuggingFace, Anthropic, MLX (Apple Silicon)
- β‘ High Performance: Optimized with caching, indexing, and MLP scoring
- π’ Production Ready: Docker deployment, monitoring, health checks
- π― Easy Demo: One-command setup with progress indicators
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β FastAPI β β Hybrid β β Knowledge β
β REST API βββββΆβ Retriever βββββΆβ Graph (Neo4j) β
β β β β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
β βΌ β
β βββββββββββββββββββ β
β β Vector Index β β
βΌ β (FAISS) β βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β LLM Backend β β MLP Scoring β
β (OpenAI/HF/MLX)β β Model β
βββββββββββββββββββ βββββββββββββββββββ
# Clone the repository
git clone <repository-url>
cd SubGraphPlus
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtSubgraphRAG+ uses a clean two-tier configuration system that separates secrets from application settings:
.env: Secrets and environment-specific values (never commit to git)config/config.json: Application settings, models, and parameters (version controlled)
This separation follows security best practices and makes deployment across environments simple.
# Copy the example and customize with your secrets
cp .env.example .env
nano .env # Add your actual credentials and API keysThe main configuration file controls all application behavior:
{
"models": {
"backend": "mlx",
"llm": {
"mlx": {
"model": "mlx-community/Qwen3-14B-8bit",
"max_tokens": 512,
"temperature": 0.1
}
},
"embeddings": {
"model": "Alibaba-NLP/gte-large-en-v1.5",
"backend": "transformers"
}
},
"retrieval": {
"token_budget": 4000,
"max_dde_hops": 2,
"similarity_threshold": 0.7
}
}Contains only secrets and environment-specific values:
# === Database Credentials ===
NEO4J_URI=neo4j://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_secure_password
# === API Security ===
API_KEY_SECRET=your_secret_key
# === API Keys ===
OPENAI_API_KEY=sk-your-key # Required for OpenAI backend
HF_TOKEN=hf_your-token # Optional for private HF models
# === Environment ===
ENVIRONMENT=development
LOG_LEVEL=INFO
DEBUG=falseKey Principles:
- Secrets in
.env- Never commit credentials to version control - Settings in
config.json- Application configuration is version controlled - No duplication - Each setting has one clear location
- Environment overrides -
.envcan overrideconfig.jsondefaults when needed
Start Neo4j database:
# Using Docker
docker run \
--name neo4j \
-p 7474:7474 -p 7687:7687 \
-d \
-e NEO4J_AUTH=neo4j/your_password \
neo4j:latest
# Or use Neo4j Desktop/AuraDB and update NEO4J_URI in .env# Start the FastAPI server
python -m uvicorn src.app.api:app --reload --host 0.0.0.0 --port 8000The API will be available at http://localhost:8000
The fastest way to see SubgraphRAG+ in action:
# Clone the repository
git clone https://github.com/your-username/SubgraphRAGPlus.git
cd SubgraphRAGPlus
# Setup and run demo (one command!)
./bin/setup_dev.sh --run-demo
# Or run the demo script directly
python examples/demo_quickstart.py --help- π Progress Indicators: Clear step-by-step feedback (π Step 1/6, β completed)
- β‘ Smart Performance: Skips data ingestion if already present
- π§ Flexible Options:
--skip-neo4j,--skip-data, custom ports - π― Fast Startup: Optimized health checks and server startup
- π‘ Helpful Errors: Clear guidance when things go wrong
# Full demo with all components
python examples/demo_quickstart.py
# Skip data ingestion if already present
python examples/demo_quickstart.py --skip-data
# Skip Neo4j for CI/testing environments
python examples/demo_quickstart.py --skip-neo4j
# Custom port
python examples/demo_quickstart.py --port 8080
# Minimal demo for quick testing
python examples/demo_quickstart.py --skip-neo4j --skip-data --port 8001- π§ Environment Setup: Automatic dependency and configuration setup
- ποΈ Database Connection: Neo4j connectivity and schema migration
- π§ Model Loading: MLP model detection and validation
- π₯ Data Ingestion: Sample knowledge graph population (if needed)
- π API Server: FastAPI server startup with health checks
- π§ͺ Live Query: Demonstration query with real-time response
The demo provides a complete end-to-end experience in under 2 minutes!
GET /health- Health checkGET /ready- Readiness check (includes model status)GET /metrics- Prometheus metrics
POST /query- Ask questions using RAGPOST /ingest- Add documents to knowledge basePOST /feedback- Provide feedback on responsesGET /graph/browse- Browse knowledge graph
All endpoints (except health/metrics) require API key authentication:
curl -H "X-API-Key: your_api_key" http://localhost:8000/querySubgraphRAG+ includes a modern Next.js frontend with shadcn/ui components for interactive knowledge graph exploration and chat-based querying.
- π¬ Interactive Chat Interface: Real-time conversation with the knowledge graph
- π Knowledge Graph Visualization: Interactive D3.js-powered graph exploration
- π Analytics Dashboard: Query performance and system metrics
- π Document Management: Upload and manage knowledge base documents
- β‘ Real-time Updates: Server-sent events for live response streaming
- π¨ Modern UI: Built with Next.js 15, React 19, and Tailwind CSS
# Navigate to frontend directory
cd frontend
# Install dependencies
npm install
# Create environment configuration
cp .env.local.example .env.local
# Configure API endpoint
echo "NEXT_PUBLIC_API_URL=http://localhost:8000" >> .env.local
echo "NEXT_PUBLIC_API_KEY=your_api_key" >> .env.local
# Start development server
npm run devThe frontend will be available at http://localhost:3000
frontend/
βββ src/
β βββ app/ # Next.js App Router pages
β βββ components/ # Reusable UI components
β β βββ ui/ # shadcn/ui base components
β β βββ chat-support.tsx # Chat interface
β β βββ data-table.tsx # Knowledge graph browser
β β βββ ...
β βββ hooks/ # Custom React hooks
β βββ lib/ # Utility functions
β βββ util/ # Helper utilities
βββ public/ # Static assets
βββ package.json # Dependencies and scripts
- Chat Interface (
chat-support.tsx): Real-time chat with SSE streaming - Graph Visualization (
data-table.tsx): Interactive knowledge graph browser - Navigation (
app-sidebar.tsx): Application navigation and user management - Analytics (
chart-area-interactive.tsx): Performance metrics and insights
# Install dependencies
npm install
# Run development server with hot reload
npm run dev
# Build for production
npm run build
# Start production server
npm start
# Run linting
npm run lintCreate .env.local with your API configuration:
# API Configuration
NEXT_PUBLIC_API_URL=http://localhost:8000
NEXT_PUBLIC_API_KEY=your_api_key
# Optional: Analytics and monitoring
NEXT_PUBLIC_ANALYTICS_ID=your_analytics_idThe frontend can be deployed to any platform supporting Next.js:
# Build for production
npm run build
# Deploy to Vercel (recommended)
npx vercel
# Or deploy to other platforms
npm start # Runs production server on port 3000SubgraphRAG+ uses a clean two-tier configuration system that separates secrets from application settings:
.env: Secrets and environment-specific values (never commit to git)config/config.json: Application settings, models, and parameters (version controlled)
This separation follows security best practices and makes deployment across environments simple.
# Copy the example and customize with your secrets
cp .env.example .env
nano .env # Add your actual credentials and API keysThe main configuration file controls all application behavior:
{
"models": {
"backend": "mlx",
"llm": {
"mlx": {
"model": "mlx-community/Qwen3-14B-8bit",
"max_tokens": 512,
"temperature": 0.1
}
},
"embeddings": {
"model": "Alibaba-NLP/gte-large-en-v1.5",
"backend": "transformers"
}
},
"retrieval": {
"token_budget": 4000,
"max_dde_hops": 2,
"similarity_threshold": 0.7
}
}Contains only secrets and environment-specific values:
# === Database Credentials ===
NEO4J_URI=neo4j://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_secure_password
# === API Security ===
API_KEY_SECRET=your_secret_key
# === API Keys ===
OPENAI_API_KEY=sk-your-key # Required for OpenAI backend
HF_TOKEN=hf_your-token # Optional for private HF models
# === Environment ===
ENVIRONMENT=development
LOG_LEVEL=INFO
DEBUG=falseKey Principles:
- Secrets in
.env- Never commit credentials to version control - Settings in
config.json- Application configuration is version controlled - No duplication - Each setting has one clear location
- Environment overrides -
.envcan overrideconfig.jsondefaults when needed
SubgraphRAG+ features a high-performance test suite with 99.9% faster execution through optimized model loading and comprehensive mocking.
# Fast tests (recommended for development) - ~0.16s
make test-fast
# Full test suite with verbose logging
make test-verbose
# Standard test run with optimizations
make test
# Single test with debugging
TESTING=1 LOG_LEVEL=DEBUG python -m pytest tests/test_api.py::TestHealthEndpoints::test_readiness_check_success -v -s| Test Type | Before | After | Improvement |
|---|---|---|---|
| Fast Tests | ~5+ minutes | ~0.16s | 99.9% faster |
| Individual API Test | ~30+ seconds | ~0.5s | 98% faster |
| Test Collection | Segmentation fault | Instant | Fixed |
TESTING=1: Enables testing mode with model loading disabledDISABLE_MODELS=1: Explicitly disables all model loadingLOG_LEVEL=DEBUG: Enables detailed logging for debuggingFAST_TEST_MODE=1: Optimizes for fastest possible test execution
# API endpoint tests
python -m pytest tests/test_api.py -v
# Core functionality tests
python -m pytest tests/test_basic.py -v
# ML model tests (with mocking)
python -m pytest tests/test_mlp_model.py -v
# Embedding tests
python -m pytest tests/test_embedder.py -v
# Run with coverage
python -m pytest --cov=src tests/See Testing Improvements Guide for detailed technical information about the performance optimizations.
- API Layer (
src/app/api.py): FastAPI application with endpoints - Configuration (
src/app/config.py): Centralized configuration management - Database (
src/app/database.py): Neo4j and SQLite database interfaces - ML Models (
src/app/ml/): LLM and embedding model abstractions - Retrieval (
src/app/retriever.py): RAG retrieval logic - Utils (
src/app/utils.py): Shared utilities
- Ingestion: Documents β Embeddings β Neo4j Graph + Vector Index
- Query: Question β Embedding β Graph Retrieval β LLM β Response
- Feedback: User feedback β SQLite β Model improvement
- Create a new class in
src/app/ml/llm.pyimplementing theLLMInterface - Add backend configuration to
config/config.json - Update the factory function in
get_llm_model()
- Create a new class in
src/app/ml/embedder.pyimplementing theEmbedderInterface - Add backend configuration to
config/config.json - Update the factory function in
get_embedder()
The configuration system supports:
- Type validation: Automatic type checking and conversion
- Environment overrides: Override any config value via environment variables
- Nested configurations: Hierarchical settings with dot notation
- Default values: Fallback values for optional settings
Available at /metrics:
- HTTP request metrics
- Response times
- Error rates
- Custom application metrics
Structured logging with configurable levels:
LOG_LEVEL=DEBUG # DEBUG, INFO, WARNING, ERROR
LOG_FILE=logs/app.log # Optional file outputFROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "src.app.api:app", "--host", "0.0.0.0", "--port", "8000"]Create different config files for each environment:
config/config.json(default)config/config.production.jsonconfig/config.staging.json
Set CONFIG_FILE environment variable to override:
CONFIG_FILE=config/config.production.json python -m uvicorn src.app.api:app- API Key Authentication: All endpoints protected
- Input Validation: Pydantic models for request validation
- Rate Limiting: Built-in FastAPI rate limiting
- CORS: Configurable cross-origin resource sharing
- Lazy Loading: Models loaded only when needed
- Connection Pooling: Efficient database connections
- Caching: Response and embedding caching
- Apple Silicon: MLX backend for M1/M2/M3 optimization
- Cold Start: ~2-3 seconds (with model loading)
- Query Response: ~200-500ms (cached embeddings)
- Ingestion: ~100-200 docs/minute
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass:
TESTING=1 python -m pytest - Submit a pull request
[Add your license information here]
Import Errors: Ensure all dependencies are installed and virtual environment is activated
Database Connection: Verify Neo4j is running and credentials in .env are correct
Model Loading: Check model names in config/config.json and ensure sufficient disk space
API Key Issues: Verify API_KEY_SECRET is set and using correct header format
- Check the logs:
tail -f logs/app.log - Run health checks:
curl http://localhost:8000/health - Test configuration:
TESTING=1 python -c "from src.app.config import config; print(config)"
| Document | Description | Audience |
|---|---|---|
| π Documentation Hub | Complete documentation index | All users |
| π οΈ Installation Guide | Detailed setup instructions | New users |
| ποΈ Architecture Guide | System design and components | Developers, Architects |
| π§ Development Guide | Contributing and local dev | Contributors |
| π Deployment Guide | Production deployment | DevOps, SysAdmins |
| π‘ API Reference | Complete API documentation | Integrators |
| π§ Configuration | Settings and environment vars | All users |
| π©Ί Troubleshooting | Common issues and solutions | All users |
For optimized performance on M1/M2/M3 Macs:
- See MLX Integration Guide for native Apple Silicon acceleration
- Use
./bin/setup_dev.shwhich auto-detects and configures MLX
- OS: Linux, macOS, or Windows with WSL2
- Python: 3.9+ (tested up to 3.13)
- Memory: 4GB RAM
- Storage: 10GB free space
- Docker: 20.10+ with Compose v2 (for Docker setup)
- CPU: 4+ cores
- Memory: 8GB+ RAM
- Storage: 50GB+ SSD
- Network: Stable internet connection for LLM APIs
import requests
# Query with graph visualization
response = requests.post(
"http://localhost:8000/query",
headers={"X-API-Key": "your-api-key"},
json={
"question": "What is machine learning?",
"visualize_graph": True,
"max_context_triples": 50
}
)
# Stream the response
for line in response.iter_lines():
if line:
data = json.loads(line.decode('utf-8'))
print(f"Type: {data['type']}, Content: {data['content']}")# Basic health check
curl http://localhost:8000/healthz
# Comprehensive readiness check
curl http://localhost:8000/readyz# Browse the knowledge graph
curl "http://localhost:8000/graph/browse?limit=100&search_term=AI" \
-H "X-API-KEY: your-api-key"# Start development server
make serve # or: python src/main.py --reload
# Run tests
make test # Run full test suite
make test-smoke # Quick smoke tests
make test-api # API integration tests
# Code quality
make lint # Check code style
make format # Auto-format code
# Database operations
make neo4j-start # Start Neo4j container
make migrate-schema # Apply database migrations
make ingest-sample # Load sample dataSubgraphRAGPlus/
βββ π src/ # Application source code
β βββ π main.py # Application entry point
β βββ π app/ # Core application modules
β βββ π api.py # FastAPI REST endpoints
β βββ π retriever.py # Hybrid retrieval engine
β βββ π database.py # Neo4j & SQLite connections
β βββ π ml/ # ML models (LLM, embeddings, MLP)
βββ π bin/ # Setup and utility scripts
βββ π scripts/ # Python utilities and tools
βββ π tests/ # Comprehensive test suite
βββ π docs/ # Documentation
βββ π config/ # Configuration files
βββ π deployment/ # Docker and infrastructure
βββ π Makefile # Development commands
βββ π requirements.txt # Python dependencies
# Production deployment
cd deployment/
docker-compose -f docker-compose.prod.yml up -d
# Scale API instances
docker-compose -f docker-compose.prod.yml up -d --scale api=3
# Monitor services
docker-compose -f docker-compose.prod.yml logs -fSubgraphRAG+ uses a hybrid configuration approach following security best practices:
.env- Secrets and environment-specific valuesconfig/config.json- Application settings and model configurations
Essential production environment variables for secrets and environment-specific settings:
# === Database Credentials ===
NEO4J_URI=neo4j://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your-secure-production-password
# === API Security ===
API_KEY_SECRET=your-secure-api-key
# === API Keys ===
OPENAI_API_KEY=your-openai-api-key # Required if using OpenAI backend
HF_TOKEN=your-hf-token # Optional, for private HuggingFace models
# === Environment Settings ===
ENVIRONMENT=production # development|staging|production
LOG_LEVEL=INFO # DEBUG|INFO|WARNING|ERROR|CRITICAL
DEBUG=false
# === Optional: Custom Model Paths ===
# MLX_LLM_MODEL_PATH=/path/to/custom/mlx/model
# HF_MODEL_PATH=/path/to/custom/hf/modelModel settings and application configuration:
{
"models": {
"backend": "mlx",
"llm": {
"mlx": {
"model": "mlx-community/Qwen3-14B-8bit",
"max_tokens": 512,
"temperature": 0.1,
"top_p": 0.9
},
"openai": {
"model": "gpt-3.5-turbo",
"max_tokens": 512,
"temperature": 0.1,
"top_p": 0.9
}
},
"embeddings": {
"model": "Alibaba-NLP/gte-large-en-v1.5",
"backend": "transformers"
}
},
"retrieval": {
"token_budget": 4000,
"max_dde_hops": 2,
"similarity_threshold": 0.7
},
"performance": {
"cache_size": 1000,
"api_rate_limit": 60,
"timeout_seconds": 30
}
}- Never commit secrets: Keep
.envin.gitignore - Use environment overrides: Local
.envcan overrideconfig.jsondefaults - Embedding consistency: Always use
transformersbackend for embeddings (never MLX) - Backend separation: MLX for LLM only, transformers for embeddings only
For optimal performance on M1/M2/M3 Macs:
# In .env
LOG_LEVEL=DEBUG # To see MLX initialization logs// In config/config.json
{
"models": {
"backend": "mlx",
"llm": {
"mlx": {
"model": "mlx-community/Qwen3-14B-8bit",
"max_tokens": 1024,
"temperature": 0.1
}
},
"embeddings": {
"model": "Alibaba-NLP/gte-large-en-v1.5",
"backend": "transformers"
}
}
}- Health Check:
GET /healthz- Basic liveness probe - Readiness Check:
GET /readyz- Dependency health with detailed status - Metrics:
GET /metrics- Prometheus-compatible metrics - API Docs:
GET /docs- Interactive OpenAPI documentation - Neo4j Browser: http://localhost:7474 - Database management interface
# Full test suite
make test
# Specific test categories
python -m pytest tests/test_api.py -v # API tests
python -m pytest tests/test_retriever.py -v # Retrieval tests
python -m pytest tests/test_mlp_model.py -v # MLP model tests
# With coverage report
python -m pytest --cov=src tests/ --cov-report=html- Unit Tests: Individual component testing
- Integration Tests: Multi-component workflows
- API Tests: REST endpoint validation
- Smoke Tests: Basic system functionality
- Performance Tests: Benchmarking and load testing
We welcome contributions! Here's how to get started:
# 1. Fork and clone
git clone https://github.com/your-username/SubgraphRAGPlus.git
cd SubgraphRAGPlus
# 2. Setup development environment
./bin/setup_dev.sh --run-tests
# 3. Create a feature branch
git checkout -b feature/your-feature-name
# 4. Make changes and test
make test
make lint
# 5. Submit a pull request- Code Style: Follow PEP 8 with Black formatting
- Testing: Add tests for new features
- Documentation: Update docs for user-facing changes
- Commits: Use conventional commit messages
See Contributing Guide for detailed information.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- π Bug Reports: GitHub Issues
- π‘ Feature Requests: GitHub Discussions
- π Documentation Issues: Create an Issue
- π¬ General Questions: GitHub Discussions
- Original SubgraphRAG research by Microsoft Research
- Neo4j and FAISS communities for graph and vector database technologies
- FastAPI, PyTorch, and Python ecosystem contributors
- Contributors and users of this project
β Star this repository if you find it useful!
Made with β€οΈ for the Knowledge Graph community
- REBEL IE Service: Uses Babelscape/rebel-large for proper triple extraction from raw text
- Schema-Driven Entity Typing: Replaces naive string heuristics with authoritative type mappings
- Domain Adaptability: Works with Biblical text, legal documents, scientific papers, etc.
- Offline Operation: No external API dependencies, fully self-contained
Key Distinction:
- REBEL: Extracts relations (
Jesus β place of birth β Bethlehem) - Entity Typing: Classifies entity types (
Jesus:Person,Bethlehem:Location) - Combined:
(Jesus:Person) --[place of birth]--> (Bethlehem:Location)
- Live Ingestion Pipeline: Build KGs from any text corpus in real-time
- Incremental Updates: Add new content without rebuilding entire graph
- Quality Control: Deduplication, validation, and error handling
- Hybrid Retrieval: Combines graph traversal with dense vector search
- MLP-Based Scoring: Uses original SubgraphRAG MLP (no retraining needed)
- Budget-Aware Assembly: Optimizes subgraph size for LLM context windows
- Microservices: Containerized IE service, API layer, database components
- Monitoring: Health checks, metrics, logging, alerting
- Scalability: Horizontal scaling, caching, batch processing
# 1. Start the unified API (includes IE functionality)
uvicorn src.app.api:app --host 0.0.0.0 --port 8000
# 2. Ingest Biblical text (IE is integrated)
python scripts/ingest_with_ie.py data/genesis.txt --api-key your-api-key
# 3. Process staged triples
python scripts/ingest_worker.py --process-all
# 4. Query the knowledge graph
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-H "X-API-KEY: your-api-key" \
-d '{"question": "Who parted the Red Sea?"}'The unified system will:
- Extract triples using integrated REBEL:
(Moses, parted, Red Sea) - Type entities using schema:
Moses β Person, Red Sea β Location - Build knowledge graph with proper relationships
- Answer queries with precise citations and subgraph evidence