🔗 Live Demo: Try it on HuggingFace Spaces
A retrieval-augmented generation (RAG) chatbot that answers natural-language questions about diabetic retinopathy, grounded in open-access research papers. It retrieves relevant passages from a document collection and uses a large language model to generate accurate, source-cited answers.
This project extends my Diabetic Retinopathy Grading work (a vision model that grades DR severity) into the LLM / NLP domain, adding RAG, vector databases, embeddings, and LLM API integration.
⚠️ Research demo only — not medical advice. Answers are generated from a limited document set and may be incomplete or wrong. Always consult a qualified ophthalmologist.
A plain LLM answers from memory and can hallucinate. RAG instead grounds the answer in real documents:
- Ingest — documents are split into chunks and converted to vectors (embeddings) stored in a vector database.
- Retrieve — at question time, the question is embedded and the most semantically similar chunks are pulled from the database.
- Generate — those chunks are given to the LLM, which is instructed to answer only from them and cite its sources.
PDFs ──▶ chunk ──▶ embed ──▶ ChromaDB (ingestion, run once)
│
Question ──▶ embed ──▶ retrieve top-k chunks ─┘
│
chunks + question ──▶ LLM ──▶ cited answer (at query time)
| Component | Choice |
|---|---|
| Orchestration | LangChain |
| Vector store | ChromaDB (persistent, on disk) |
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 (free, runs on CPU) |
| LLM | Llama 3.3 70B via Groq API (free tier) |
| UI | Gradio |
| Deployment | HuggingFace Spaces |
Medical-RAG-Chatbot/
├── app.py # Gradio chat UI (HF Spaces entry point)
├── requirements.txt
├── .env.example # template for your GROQ_API_KEY
├── .gitignore
├── data/ # source PDFs (gitignored)
└── src/
├── config.py # all settings in one place
├── download_sources.py # fetch open-access DR papers from arXiv
├── ingest.py # load → chunk → embed → store in ChromaDB
└── rag_pipeline.py # retrieve → prompt Groq → cited answer
# 1. Install dependencies
pip install -r requirements.txt
# 2. Add your free Groq API key
cp .env.example .env
# then edit .env and paste your key from https://console.groq.com/keys
# 3. Download source documents (open-access arXiv papers on DR)
python -m src.download_sources
# (or drop your own PDFs into the data/ folder)
# 4. Build the vector store (run once, or after changing documents)
python -m src.ingest
# 5. Launch the chatbot
python app.py
# open the printed local URL in your browser- "What is diabetic retinopathy?"
- "How is the severity of diabetic retinopathy graded?"
- "What deep learning methods are used to detect diabetic retinopathy?"
- "Why is early screening important?"
The chatbot answering questions with grounded, source-cited responses, each ending with a safety disclaimer:
All tunable settings live in src/config.py:
CHUNK_SIZE/CHUNK_OVERLAP— how documents are splitEMBED_MODEL— the embedding modelTOP_K— how many chunks to retrieve per questionLLM_MODEL/LLM_TEMPERATURE— the Groq model and its creativity
- Add more diseases and clinical guideline sources
- Try a stronger or domain-specific embedding model (e.g. BioBERT-based)
- Add a re-ranking step after retrieval
- Eval

