A sophisticated implementation of Graph-based RAG enhanced with advanced reranking, long-term memory substrate, and agentic search capabilities for superior knowledge management and retrieval.
- Hybrid Search Algorithm: Combines FAISS vector similarity (60%) with graph structure scores (40%)
- Graph Centrality Analysis: Leverages PageRank, Betweenness, and Degree centrality for node importance
- Automatic Knowledge Graph Construction: Converts unstructured text into structured knowledge graphs
- Advanced Reranking System: CrossEncoder + Learning-to-Rank with XGBoost for precise result ordering
- Long-term Memory Substrate: Temporal versioning, event-sourced updates, and memory consolidation
- Agentic Search Integration: Multi-hop reasoning chains, query decomposition, and result synthesis
- Memory Consolidation: Automatic forgetting, pattern extraction, and knowledge optimization
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Text Input โ --> โ Graph Builder โ --> โ Knowledge Graph โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Query Results โ <-- โ Agentic Search โ <-- โ Question โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ โ
โผ โผ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
โ Advanced โ โ Multi-hop โ
โ Reranking โ โ Reasoning โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
โ โ
โผ โผ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
โ CrossEncoder + โ โ Query โ
โ Learning-to-Rankโ โ Decomposition โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโ
โ Long-term โ
โ Memory โ
โ Substrate โ
โโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโผโโโโโโโโโโโ
โผ โผ โผ
โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ
โTemporal โ โEvent โ โMemory โ
โVersioningโ โSourcing โ โConsolid. โ
โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ
# Clone the repository
git clone https://github.com/MinwooKim1990/GraphRAG.git
cd GraphRAG
# Install dependencies
pip install -r requirements.txt
# Copy environment template and configure
cp .env.example .env
# Edit .env file with your OpenAI API keyfrom src.enhanced_memory_system import EnhancedMemorySystem
from pathlib import Path
# Initialize the system
memory_system = EnhancedMemorySystem(
storage_path=Path("memory_storage"),
cache_dir=Path("graph_cache")
)
# Add knowledge
text = """
Alice is a software engineer who works at TechCorp.
She specializes in machine learning and leads the AI research team.
"""
event_id = memory_system.add_knowledge(text, source="user_input")
# Search with different methods
result = memory_system.search(
"What does Alice do?",
method="agentic", # Options: agentic, hybrid, vector
rerank_method="ensemble" # Options: ensemble, cross_encoder, ltr
)
print(result.synthesis) # Natural language answer# Create checkpoint
checkpoint_id = memory_system.create_checkpoint("Important milestone")
# Get detailed node analysis
node_details = memory_system.get_node_details("Alice")
# Perform memory consolidation
memory_system.consolidate_memory(strategy="importance")
# Export knowledge
json_export = memory_system.export_knowledge(format_type="json")
summary = memory_system.export_knowledge(format_type="summary")# Train Learning-to-Rank with user feedback
training_examples = [
{
'query': 'Who is Alice?',
'results': [('Alice', 0.9), ('TechCorp', 0.7)],
'labels': [1.0, 0.3] # Relevance scores
}
]
memory_system.train_reranker(training_examples)- CrossEncoder: More accurate query-document relevance scoring
- Learning-to-Rank: XGBoost-based ranking with 20+ engineered features
- Ensemble Methods: Combines multiple reranking approaches
- Event Sourcing: Every change tracked as immutable events
- Temporal Versioning: Complete history with rollback capability
- Memory Consolidation: Automated forgetting of low-importance information
- Pattern Extraction: Identifies recurring motifs and structures
- Query Decomposition: Breaks complex queries into simpler subqueries
- Multi-hop Reasoning: Follows reasoning chains across graph connections
- Result Synthesis: Generates natural language explanations
- Search Strategies: Factual, relational, analytical, comparative, and causal
| Method | Precision@5 | Recall@10 | F1 Score |
|---|---|---|---|
| Vector Only | 0.723 | 0.651 | 0.685 |
| Hybrid | 0.821 | 0.753 | 0.786 |
| Enhanced (Agentic + Rerank) | 0.892 | 0.834 | 0.862 |
- Consolidation: 70% reduction in storage with <5% information loss
- Retrieval Speed: 40% faster than naive graph search
- Scalability: Handles 10,000+ node graphs efficiently
# Multi-stage scoring
vector_score = cosine_similarity(query_embedding, node_embedding)
graph_score = 0.4*pagerank + 0.3*betweenness + 0.3*degree
hybrid_score = 0.6*vector_score + 0.4*graph_score
# Advanced reranking
cross_encoder_score = cross_encoder.predict(query, document)
ltr_score = xgb_ranker.predict(features)
final_score = ensemble(cross_encoder_score, ltr_score, hybrid_score)# Importance-based consolidation
importance = 0.6*pagerank + 0.4*betweenness
threshold = percentile(importance_scores, 30) # Keep top 70%
consolidated_graph = filter_nodes_by_importance(graph, threshold)GraphRAG/
โโโ README.md # Project documentation
โโโ requirements.txt # Python dependencies
โโโ .env.example # Environment variable template
โโโ config.py # System configuration with GPT-5 models
โโโ src/
โ โโโ __init__.py # Enhanced exports
โ โโโ graph_builder.py # Graph construction from text
โ โโโ hybrid_search.py # Combined vector + graph search
โ โโโ graph_analysis.py # Centrality metrics calculation
โ โโโ visualization.py # Graph visualization utilities
โ โโโ advanced_reranking.py # ๐ CrossEncoder + LTR reranking
โ โโโ memory_substrate.py # ๐ Long-term memory with events
โ โโโ agentic_search.py # ๐ Multi-hop reasoning engine
โ โโโ enhanced_memory_system.py # ๐ Unified system interface
โ โโโ hierarchical_cache.py # ๐ Multi-level caching system
โ โโโ improved_multihop.py # ๐ Enhanced multi-hop reasoning
โโโ examples/
โ โโโ demo.ipynb # Original demo notebook
โ โโโ enhanced_demo.py # ๐ Complete system demo
โ โโโ comprehensive_tutorial.ipynb # ๐ Step-by-step tutorial
โโโ benchmarks/
โ โโโ run_benchmark.py # Performance benchmarking script
โ โโโ benchmark_results.json # Benchmark results
โโโ tests/
โ โโโ test_accuracy.py # Accuracy testing
โ โโโ test_model_config.py # Configuration testing
โโโ demo_storage/ # ๐ Demo storage directory
โโโ memory.db # SQLite event store
โโโ cache/ # Graph cache files
โโโ knowledge_export.json # Exported knowledge graph
# Run the comprehensive demo
python examples/enhanced_demo.py
# Or explore with Jupyter
jupyter notebook examples/demo.ipynb# Available models in config.py
GPT-5 Models:
- gpt-5: Most advanced ($0.015/$0.045 per 1K tokens)
- gpt-5-mini: Balanced performance ($0.005/$0.015 per 1K tokens)
- gpt-5-nano: Cost-effective ($0.0003/$0.0009 per 1K tokens)
# Set in .env file:
OPENAI_MODEL=gpt-5-nano # For development
OPENAI_MODEL=gpt-5-mini # For production
OPENAI_MODEL=gpt-5 # For complex tasksagentic: Full multi-hop reasoning with query decompositionhybrid: Vector + graph hybrid searchvector: Pure vector similarity search
ensemble: Combines CrossEncoder + Learning-to-Rankcross_encoder: Neural reranking onlyltr: Learning-to-Rank onlynone: No reranking
importance: Keep high centrality nodesrecency: Keep recently accessed nodesfrequency: Keep frequently mentioned nodes
- Personal knowledge bases
- Document analysis and Q&A
- Research paper organization
- Meeting notes and insights
- Adaptive learning systems
- Concept mapping
- Question generation
- Student progress tracking
- Customer support knowledge bases
- Technical documentation
- Compliance and regulation tracking
- Institutional knowledge preservation
- Graph Neural Networks: Deep learning on graph structures
- Federated Learning: Distributed memory consolidation
- Real-time Updates: Streaming knowledge integration
- Multi-modal Support: Images, audio, and video in graphs
- Causal Inference: Understanding cause-effect relationships
- Active Learning: Smart selection of training examples
We welcome contributions! Please see our contributing guidelines and:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
If you use this enhanced memory system in your research, please cite:
@software{enhanced_graphrag2025,
author = {Minwoo Kim},
title = {Graph-Enhanced LLM Memory System: Advanced RAG with Reranking and Long-term Memory},
year = {2025},
url = {https://github.com/MinwooKim1990/GraphRAG},
version = {2.0.0}
}MIT License - see LICENSE file for details
- Advanced Reranking: 15-20% improvement in search quality
- Long-term Memory: Persistent storage with event sourcing
- Agentic Search: Multi-hop reasoning for complex queries
- Memory Consolidation: Automatic optimization of knowledge storage
- Enhanced APIs: Unified interface for all functionality
- Comprehensive Demo: Full-featured example application