This project is a Multimodal Retrieval-Augmented Generation (RAG) system for advanced question answering. It supports text and image retrieval, chat memory, and web article ingestion for enhanced LLM-based responses. The stack includes a FastAPI backend, React frontend, and Qdrant vector database, all orchestrated via Docker Compose.
- Multimodal RAG: Combines chat, articles, and images for context-rich answers.
- Web Scraping: Scrape and chunk articles from deeplearning.ai/the-batch.
- Image Embeddings: Store and retrieve relevant images using CLIP.
- Text Embeddings: CLIP-based unified vector space for text and image alignment.
- Chat Memory: Vector-based chat history for persistent context.
- Evaluation: RAGAS-based evaluation pipeline with metrics like faithfulness, answer relevancy, and context precision.
- User Input: A question is submitted via the React frontend.
- Text Embedding: The question is embedded using CLIP (ViT-B-32).
- Vector Search: Contextual chunks and image embeddings are retrieved from Qdrant.
- LLM Generation: The query and context are passed to an OpenAI LLM (GPT-4o).
- Response Display: The answer is shown alongside relevant images and article sources.
| Component | Tool/Library | Reason for Selection |
|---|---|---|
| LLM | OpenAI GPT-4o | High-quality natural language generation |
| Text Embedding | open-clip (ViT-B-32) | Shared space for text & images |
| Image Embedding | open-clip (ViT-B-32) | Unified vision-language embedding |
| Vector Storage | Qdrant | Fast vector DB with payload filtering |
| Backend API | FastAPI | Modern, async Python API framework |
| Frontend | React + Vite | Fast, modern web UI |
| Evaluation | RAGAS | Specialized tool for RAG quality metrics |
| Scraping | BeautifulSoup, requests | Flexible HTML content extraction |
| Orchestration | Docker Compose | Easy multi-service deployment |
git clone <your-repo-url>
cd RAGCreate a .env file in backend/:
OPENAI_API_KEY=your_openai_api_key_here
Make sure Docker is installed and running.
docker-compose up --build- Qdrant will be available at
localhost:6333 - Backend API at
localhost:8000 - Frontend at
localhost:5173
If you want to run backend or evaluation scripts outside Docker:
python3.11 -m venv venv311
source venv311/bin/activate
pip install --upgrade pip
pip install -r backend/requirements.txtOpen http://localhost:5173 in your browser.
- Enter a query and click Search to receive context-grounded answers.
- Click Scrape Articles to ingest latest content from The Batch.
- Click Clear Chat History to reset memory in the current session.
POST /api/query— Ask a question.POST /api/scrape— Scrape and ingest new articles.POST /api/clear— Clear chat memory.
Run the following command (in the root or backend directory):
python evaluation/metrics.pyMake sure your evaluation/rag_samples.jsonl contains entries like:
{
"question": "What is CLIP?",
"contexts": ["CLIP is a neural network trained on image-text pairs."],
"answer": "CLIP is a model that connects images and texts.",
"reference": "CLIP is a model trained on image-text pairs to learn a shared embedding space."
}- Faithfulness: Is the answer actually supported by the retrieved context?
- Answer Relevancy: Is the answer on-topic with respect to the question?
- Context Precision: Do the retrieved contexts actually help answer the question? (requires reference)
RAGAS uses LLM-as-a-judge (e.g., GPT-4) to rate each sample.
- CLIP for both text and image embeddings enables unified retrieval.
- Qdrant provides fast approximate search and supports metadata-rich queries.
- open-clip enables modern ViT-based CLIP variants with shared embedding spaces.
- FastAPI for a modern, async backend.
- React and Vite for a fast, modern frontend.
- RAGAS for standardized, explainable evaluation of retrieval+generation.
- Add inline citations inside generated answers.
- Add support for PDF ingestion and parsing.
- Track chat history per session/user with timestamps.
- Visualize context contribution heatmaps in UI.
- Segmentation Faults: Ensure compatible versions of PyTorch and OpenCLIP are installed.
- Empty Qdrant results: Make sure articles are scraped and vectorized properly.
- Evaluation errors: Add a
referencefield if using context-related metrics likecontext_precision. - Docker issues: Make sure Docker Desktop is running and ports 5173, 8000, and 6333 are free.
Enjoy your Multimodal RAG Assistant!