A specialized chatbot system that emulates academic professors using a hybrid Retrieval-Augmented Generation (RAG) + Knowledge Graph architecture powered by Google's Gemini AI models.
This project builds a conversational AI system that:
- Collects research papers by a specific professor (currently using Risa Wechsler as an example)
- Processes papers into both vector embeddings and a knowledge graph
- Uses dual retrieval with intelligent fusion to provide comprehensive responses
- Hosts the chatbot through a web interface with conversation continuity
The system combines vector search and graph traversal for comprehensive knowledge retrieval:
Single Advanced Mode:
- KG-Enriched Sequential: Knowledge Graph intelligence enhances vector search
The system uses an intelligent sequential approach that combines the best of both worlds:
- Pipeline Flow:
User Query → KG Retrieval → LLM Filter → Query Enrichment → Vector Search → Results
- LLM Filtering: Uses Gemini 2.5 Flash (temperature=0.0) to filter KG results for relevance
- Smart Enrichment: Original query enhanced with filtered KG context while preserving intent
- Length Management: Intelligent truncation prevents token overflow while maintaining query quality
- Fail-Fast Design: Clear error reporting without fallbacks that mask issues
- Cost Optimized: Single LLM call per query for efficient operation
All interactions use ReAct agent with full conversation memory:
- Session Memory: Maintains context across conversation turns using LangGraph
- ReAct Pattern: Follows Thought → Action → Observation → Final Answer loop
- Tool Integration: Uses document search tool with all retrieval modes
- Session Isolation: Different conversations maintain separate contexts
- Reasoning Transparency: Provides visible reasoning steps in responses
flowchart LR
%% Input & Condensation (LLM)
subgraph Input[" "]
U["User Query"]:::io --> QC{{"Query Condenser<br/>LLM"}}:::llm
end
%% KG-Enriched Sequential Pipeline
subgraph KGPipeline["KG-Enriched Sequential Pipeline"]
direction TB
G[("Knowledge Graph<br/>Neo4j")]:::store
F{{"LLM Filter"}}:::llm
E["Query Enrichment<br/>Original + KG Context"]:::algo
end
%% Vector Search
subgraph VectorSearch["Enhanced Vector Search"]
V[("Vector DB<br/>FAISS")]:::store
end
%% Answer Generation (LLM + Agent)
subgraph Generation["ReAct Agent Generation"]
direction TB
R{{"ReAct Agent<br/>with Memory"}}:::llm
A["Final Answer<br/>+ Sources + Reasoning"]:::io
end
%% Flow connections
QC -->|"Standalone Question"| G
G -->|"Raw KG Results"| F
F -->|"Filtered Context"| E
E -->|"Enriched Query"| V
V -->|"Enhanced Results"| R
R --> A
%% Styles
classDef io fill:#ffffff,stroke:#222,stroke-width:2px,color:#111
classDef llm fill:#fff4e6,stroke:#b85c00,stroke-width:2px,color:#000
classDef store fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#000
classDef algo fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px,color:#000
style Input fill:none,stroke:none
style KGPipeline fill:none,stroke:#ddd,stroke-width:1px
style VectorSearch fill:none,stroke:#ddd,stroke-width:1px
style Generation fill:none,stroke:#ddd,stroke-width:1px
The system strongly prioritizes the primary-author corpus (papers/
) over the non‑primary set (papers_np/
). This is applied as a post‑vector reranker inside the KG‑enriched pipeline and is fully configurable via environment variables.
- Default behavior: ensure a minimum share of primary sources in the final top‑k (e.g., ≥80% of top‑5 when available), while preserving the original order within each corpus.
- Configuration (env):
PRIMARY_AUTHOR_BIAS_ENABLED=true
PRIMARY_AUTHOR_PREFIX=papers/
PRIMARY_AUTHOR_MIN_SHARE=0.8
PRIMARY_AUTHOR_FINAL_K=5
This bias reflects that the target professor’s own papers should be favored when relevant, without excluding clearly superior non‑primary matches when primary supply is limited.
The system maintains conversation history through an innovative dual-context approach:
-
Document Retrieval Context
- For follow-up questions, previous queries are included in the retrieval query
- Example: If a user asks "What is dark matter?" followed by "Why is it important?", the retrieval query becomes "Context: What is dark matter? Question: Why is it important?"
- This helps the system retrieve documents relevant to the entire conversation flow
-
Response Generation Context
- Stores the last 3 conversation exchanges (question-answer pairs)
- Includes this conversational history in the prompt to the LLM
- Explicitly instructs the model to maintain continuity with previous exchanges
- Preserves the "thread" of conversation across multiple turns
-
Single-Stage RAG Implementation
- Uses a custom document QA chain with direct document processing
- Manually retrieves documents using context-enhanced queries
- Combines system instructions, conversation history, and retrieved documents in a carefully crafted prompt
This architecture ensures the chatbot can handle follow-up questions naturally, maintain professor-specific knowledge, and provide responses that feel like a cohesive conversation rather than isolated Q&A pairs.
chatbot.py
: Main chatbot with dual retrieval modes and query condensationllm_provider.py
: Gemini AI integration for embeddings and text generationapp.py
: Web application with conversation interface
paper_collector.py
: Downloads research papers by target professorrag_processor.py
: Creates FAISS vector database from papersgraph_rag/index.py
: Builds Neo4j knowledge graph with entity extractiongraph_rag/neo4j_client.py
: Graph retrieval with semantic neighborhood expansion
retrieval/fusion.py
: Reciprocal Rank Fusion algorithm and token budget management
test_dual_mode_integration.py
: Integration tests for dual retrievaltest_phase3_dual_retrieval.py
: Fusion algorithm unit teststest_real_e2e_dual_mode.py
: End-to-end tests with real componentstest_graphrag_comprehensive.py
: Knowledge graph functionality tests
-
Clone the repository:
git clone https://github.com/SandyYuan/astro-rag.git cd astro-rag
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables in
.env
file:# Google AI GOOGLE_API_KEY=your_google_api_key_here # Neo4j (for graph mode) NEO4J_URI=bolt://localhost:7687 NEO4J_USER=neo4j NEO4J_PASSWORD=your_password # System automatically uses KG-enriched sequential retrieval (only mode available) # Agent mode with conversation memory is always enabled
-
Set up Neo4j (for graph functionality):
# Install Neo4j Desktop or use Docker docker run -p 7474:7474 -p 7687:7687 -e NEO4J_AUTH=neo4j/password neo4j:5.15
-
Build the knowledge base:
# Create FAISS vector database python rag_processor.py # Build Neo4j knowledge graph python -m graph_rag.index
-
Start the web application:
python app.py
-
Access the chatbot at
http://localhost:8000
If you would like to run your own literature database or emulate a different professor:
-
Configure the target professor (defaults to Risa Wechsler as an example):
# In paper_collector.py, modify: collector = PaperCollector(author_name="Professor Name")
-
Run the paper collector to gather research content:
python paper_collector.py
-
Process the papers to build the RAG system:
python rag_processor.py
-
Start the web application:
python app.py
-
Access the chatbot at
http://localhost:8000
Once the web application is running, you can interact with the chatbot through the web interface:
- Type your question in the input field
- Press "Send" or hit Enter
- The chatbot will respond based on the professor's research papers
- Sources used to generate the response will be displayed below each answer
You can modify the system prompt in chatbot.py
to refine how the chatbot emulates the professor.
To expand the knowledge base, run the paper collector again with higher max_papers
value:
collector = PaperCollector(author_name="Professor Name")
papers = collector.collect_papers(max_papers=50)
Then reprocess the papers to update the vector database.
- Google Generative AI: Gemini 2.5 Flash for text generation and embeddings (text-embedding-004)
- LangChain: RAG pipeline and document processing
- FAISS: High-performance vector similarity search
- Neo4j: Knowledge graph database with Cypher queries
- Reciprocal Rank Fusion: Multi-retriever result combination
- Maximum Marginal Relevance (MMR): Diverse document selection
- Semantic Entity Extraction: LLM-powered knowledge graph construction
- FastAPI: Modern web framework for the chat interface
- Scholarly: Academic paper collection from Google Scholar
- Python 3.11+: Core runtime environment
- pytest: Comprehensive test suite (unit, integration, E2E)
- unittest.mock: Component mocking for isolated testing
- FAISS Mode: Excellent for document similarity and methodological details
- Neo4j Mode: Superior for entity relationships and scientific parameter queries
- Dual Mode: Best overall quality with ~2x more diverse sources
- Response Time: 8-18 seconds for complex queries
- Dual Mode Overhead: Only 4% slower than single modes
- Source Coverage: 5-10 sources per response with intelligent deduplication
- Conversation Continuity: Multi-turn context resolution with query condensation
- Query Condensation: Resolves conversational ambiguity ("What about S8?" → "What is the S8 tension in cosmology?")
- Intelligent Fusion: Combines complementary sources from vector and graph retrieval
- Scientific Accuracy: Entity filtering ensures focus on scientific parameters vs generic terms
- Comprehensive Testing: 100% test coverage with real component validation
- Requires Google API key with Gemini access and Neo4j database
- Optimized for scientific/academic content with entity-relationship focus
- Production-ready with comprehensive test coverage
- For educational and research purposes