Chat with your PDFs using Local + Cloud AI Models
This RAG system enables users to upload PDFs, extract knowledge from them, and interact through intelligent chat sessions.
Each conversation is directly connected to the user's documents, providing fact-grounded answers, traceable sources, and a deeply interactive research experience.
The system supports:
- Document-aware AI responses
- Chat history with document linking
- Local model inference via Ollama
- Cloud model inference via Groq API
- PostgreSQL persistence
- Streaming responses
- Dark mode UI
- Top-K adjustable retrieval (1โ10)
This makes the platform ideal for research, education, legal work, healthcare documentation, or any workflow requiring deep understanding of long, complex PDFs.
Users can upload one or multiple PDFs.
The system automatically:
- Extracts clean text
- Splits it into intelligent chunks
- Embeds it using nomic-embed-text
- Stores the vectors inside ChromaDB
These embeddings become the foundation for document-grounded chat responses.
Each chat session:
- Has its own associated documents
- Stores messages and AI interactions in PostgreSQL
- Retrieves information only from the PDFs linked to that chat
This ensures contextual accuracy and user separation.
- Runs the model
qwen2:1.5blocally - Zero external dependencies
- Private and offline-ready
- Lightning-fast inference
- Ideal for complex reasoning
- Automatic fallback capability
Users can choose which engine powers each conversation.
- PDF Upload
- Text Extraction
- Chunking (configurable)
- Embeddings with nomic-embed-text
- Vector search in ChromaDB
- Retrieve Top-K chunks (1โ10)
- AI model generation (local or cloud)
- Streaming tokens to the frontend
- Source citations included
Every response is grounded directly in the userโs documents.
Messages are streamed token-by-token for an instant, smooth chat experience.
- Clean and responsive chat interface
- Full dark mode support
- Organized PDF list per chat
- Source citations shown for transparency
- Frontend: React (Vite)
- Backend: NestJS
- Vector DB: ChromaDB
- Embeddings:
nomic-embed-text - LLMs:
- Local: Ollama (
qwen2:1.5b) - Cloud: Groq (
llama3-8bor configurable)
- Local: Ollama (
- Database: PostgreSQL
- Auth: JWT
- Response Method: Server-Sent Events (SSE)
Users can configure:
- Top-K Retrieval: From 1 to 10 vectors
- AI Model Selection: Local (Ollama) or Cloud (Groq)
- Sources Toggle: Show or hide PDF citations
- Streaming: Enabled by default for fast responses
- Local LLM keeps sensitive content offline
- PostgreSQL securely stores chat history
- ChromaDB stores embeddings locally inside Docker volumes
- No external API calls unless the user selects Groq cloud inference
- Docker runs the entire RAG stack (frontend, backend, DB, vector store, LLM) in isolated containers.
- Ensures consistent environments across all machines with no manual setup required.
- Docker Compose manages networking, service orchestration, and persistent storage volumes.
- One command builds and starts everything, making development and deployment fast and reliable.
-
Chat With Your PDFs, Instantly:
No more scrolling through long documents or searching manually. Upload PDFs, ask questions, and get precise answers backed by real citations. Itโs like having an AI research assistant that understands your documents better than you do. -
Local or Cloud AI Your Choice:
Enjoy the privacy and speed of a local model through Ollama, or switch to Groqโs lightning-fast cloud models for deeper reasoning. One system, two powerful engines, fully in your control. -
Smart Retrieval, Better Accuracy:
Powered by ChromaDB and nomic-embed-text embeddings, the system brings only the most relevant chunks from your PDFs. And with adjustable Top-K (1 to 10), you decide how deep the AI digs for answers. -
Document-Aware Chat Sessions:
Each chat keeps its own set of PDFs, letting you explore different topics independently. Every message includes source references so you always know exactly where the answer came from. -
Beautiful, Modern Chat Experience:
Real-time streaming responses, a sleek dark mode, and a clean interface create an intuitive, distraction-free environment for research, study, or analysis. -
Your Knowledge, Fully Yours:
Chats, PDFs, and message history are stored securely in PostgreSQL. Data stays organized, persistent, and always ready for where you left off.
| Chat screen | SideBar screen | Setting screen |
|---|---|---|
| Citation chat screen | Login Screen | Signup screen |
|---|---|---|
| Chats Screen Stream response |
|---|
| Croq response (Cloud model) |
|---|
| Chat Page | Setting |
|---|---|
| Chat Page Dark Mode | Setting Dark Mode |
|---|---|
| Login Page | SignUp Page |
|---|---|
| Swagger | Swagger |
|---|---|
| Swagger | Swagger |
|---|---|
This project includes a fully Dockerized environment covering the frontend, backend, PostgreSQL, ChromaDB, and Ollama allowing you to run the entire system with one command.
git clone https://github.com/Hasan-Mawassi/RAG-System.git
cd rag-system
Copy the example environment file in rag-server and update it with your configuration:
cp .env.example .envThen edit the .env file and fill in the necessary values.
This file includes configuration settings for:
- Database connection (PostgreSQL)
- API keys for any external services (if applicable)
- Model configurations (local models or cloud providers)
- Port mappings and service URLs for backend, frontend, ChromaDB, and Ollama
Start the full system using Docker Compose:
docker-compose up --buildThis will:
- Build the NestJS backend
- Build the React frontend
- Start PostgreSQL with persistent volumes
- Start ChromaDB for vector embeddings
- Start Ollama and load local models (
qwen2:1.5b,nomic-embed-text)
| Service | URL |
|---|---|
| Frontend (UI) | http://localhost:3000 |
| Backend API | http://localhost:5000 |
| ChromaDB | http://localhost:8000 |
| Ollama API | http://localhost:11434 |
All services communicate internally through Docker networking.
| Volume | Purpose |
|---|---|
rag_pgdata |
PostgreSQL database files |
rag_chroma-data |
ChromaDB vector embeddings |
rag_ollama-data |
Local LLM models for Ollama |
| Container | Port | Description |
|---|---|---|
| Frontend | 3000 | Web UI |
| Backend | 5000 | REST API + SSE |
| PostgreSQL | 5432 | Database |
| ChromaDB | 8000 | Embedding & search API |
| Ollama | 11434 | Local LLM inference |