Multimodal RAG Assistant

This project is a Multimodal Retrieval-Augmented Generation (RAG) system for advanced question answering. It supports text and image retrieval, chat memory, and web article ingestion for enhanced LLM-based responses. The stack includes a FastAPI backend, React frontend, and Qdrant vector database, all orchestrated via Docker Compose.

Features

Multimodal RAG: Combines chat, articles, and images for context-rich answers.
Web Scraping: Scrape and chunk articles from deeplearning.ai/the-batch.
Image Embeddings: Store and retrieve relevant images using CLIP.
Text Embeddings: CLIP-based unified vector space for text and image alignment.
Chat Memory: Vector-based chat history for persistent context.
Evaluation: RAGAS-based evaluation pipeline with metrics like faithfulness, answer relevancy, and context precision.

Architecture Overview

User Input: A question is submitted via the React frontend.
Text Embedding: The question is embedded using CLIP (ViT-B-32).
Vector Search: Contextual chunks and image embeddings are retrieved from Qdrant.
LLM Generation: The query and context are passed to an OpenAI LLM (GPT-4o).
Response Display: The answer is shown alongside relevant images and article sources.

Technology Stack

Component	Tool/Library	Reason for Selection
LLM	OpenAI GPT-4o	High-quality natural language generation
Text Embedding	open-clip (ViT-B-32)	Shared space for text & images
Image Embedding	open-clip (ViT-B-32)	Unified vision-language embedding
Vector Storage	Qdrant	Fast vector DB with payload filtering
Backend API	FastAPI	Modern, async Python API framework
Frontend	React + Vite	Fast, modern web UI
Evaluation	RAGAS	Specialized tool for RAG quality metrics
Scraping	BeautifulSoup, requests	Flexible HTML content extraction
Orchestration	Docker Compose	Easy multi-service deployment

Setup Instructions

1. Clone the Repository

git clone <your-repo-url>
cd RAG

2. Configure Environment Variables

Create a .env file in backend/:

OPENAI_API_KEY=your_openai_api_key_here

3. Build and Start All Services

Make sure Docker is installed and running.

docker-compose up --build

Qdrant will be available at localhost:6333
Backend API at localhost:8000
Frontend at localhost:5173

4. (Optional) Manual Python Environment

If you want to run backend or evaluation scripts outside Docker:

python3.11 -m venv venv311
source venv311/bin/activate
pip install --upgrade pip
pip install -r backend/requirements.txt

Usage

1. Web Interface

Open http://localhost:5173 in your browser.

Enter a query and click Search to receive context-grounded answers.
Click Scrape Articles to ingest latest content from The Batch.
Click Clear Chat History to reset memory in the current session.

2. API Endpoints

POST /api/query — Ask a question.
POST /api/scrape — Scrape and ingest new articles.
POST /api/clear — Clear chat memory.

3. Evaluate RAG Performance

Run the following command (in the root or backend directory):

python evaluation/metrics.py

Make sure your evaluation/rag_samples.jsonl contains entries like:

{
  "question": "What is CLIP?",
  "contexts": ["CLIP is a neural network trained on image-text pairs."],
  "answer": "CLIP is a model that connects images and texts.",
  "reference": "CLIP is a model trained on image-text pairs to learn a shared embedding space."
}

Evaluation Metrics

Faithfulness: Is the answer actually supported by the retrieved context?
Answer Relevancy: Is the answer on-topic with respect to the question?
Context Precision: Do the retrieved contexts actually help answer the question? (requires reference)

RAGAS uses LLM-as-a-judge (e.g., GPT-4) to rate each sample.

Design Decisions

CLIP for both text and image embeddings enables unified retrieval.
Qdrant provides fast approximate search and supports metadata-rich queries.
open-clip enables modern ViT-based CLIP variants with shared embedding spaces.
FastAPI for a modern, async backend.
React and Vite for a fast, modern frontend.
RAGAS for standardized, explainable evaluation of retrieval+generation.

Future Improvements

Add inline citations inside generated answers.
Add support for PDF ingestion and parsing.
Track chat history per session/user with timestamps.
Visualize context contribution heatmaps in UI.

Troubleshooting

Segmentation Faults: Ensure compatible versions of PyTorch and OpenCLIP are installed.
Empty Qdrant results: Make sure articles are scraped and vectorized properly.
Evaluation errors: Add a reference field if using context-related metrics like context_precision.
Docker issues: Make sure Docker Desktop is running and ports 5173, 8000, and 6333 are free.

Enjoy your Multimodal RAG Assistant!

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
backend		backend
evaluation		evaluation
frontend		frontend
.gitignore		.gitignore
README.md		README.md
Streamlit App May 20 2025.mp4		Streamlit App May 20 2025.mp4
docker-compose.yml		docker-compose.yml
generate_rag_samples.py		generate_rag_samples.py
package-lock.json		package-lock.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multimodal RAG Assistant

Features

Architecture Overview

Technology Stack

Setup Instructions

1. Clone the Repository

2. Configure Environment Variables

3. Build and Start All Services

4. (Optional) Manual Python Environment

Usage

1. Web Interface

2. API Endpoints

3. Evaluate RAG Performance

Evaluation Metrics

Design Decisions

Future Improvements

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Languages

PavloKhomenko/RAG

Folders and files

Latest commit

History

Repository files navigation

Multimodal RAG Assistant

Features

Architecture Overview

Technology Stack

Setup Instructions

1. Clone the Repository

2. Configure Environment Variables

3. Build and Start All Services

4. (Optional) Manual Python Environment

Usage

1. Web Interface

2. API Endpoints

3. Evaluate RAG Performance

Evaluation Metrics

Design Decisions

Future Improvements

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages