MedTutor is a scalable, retrieval-augmented generation (RAG) pipeline for case‑based medical education. It combines hybrid retrieval (local knowledge base + live literature search), reranking, and high‑throughput LLM generation to synthesize evidence and produce educational outputs such as feedback and multiple‑choice questions (MCQs). The system is built for speed and reproducibility with vLLM, asyncio, and multi‑GPU support.
Figure 1. Overview of the MedTutor pipeline.
- High‑throughput inference with vLLM (continuous batching, multi‑GPU, tensor parallel).
- Hybrid retrieval: dense local index + live PubMed/Semantic Scholar APIs (optional).
- Reranking via cross‑encoder or local Qwen‑Reranker served through vLLM.
- Modular multi‑stage generation: textbook summaries → final feedback → MCQs.
- Configuration‑driven experiments via JSON; swap models/prompts without code changes.
- Gradio UI for exploring a single case and running the pipeline interactively.
pipeline/async_main.py: Batch pipeline entrypoint (staged retrieval → generation → write results).pipeline/app.py: Gradio UI (single‑case run, live logs, optional demo mode).pipeline/utils/: Retrieval (local + live), reranker, vLLM handler, embeddings, keyword generator.pipeline/configs/: Example configs (configs_all.json,gradio_config.json,configs_template.json).pipeline/llm_annotation/: LLM‑as‑a‑Judge tools and configs.pipeline/manual_annotation/: Streamlit UI for human annotations.pipeline/assets/: Figures and screenshots.
- Python 3.11+
- Linux/macOS, CUDA GPU recommended (70B models require multiple GPUs or tensor parallelism)
- vLLM installed separately (not pinned in
requirements.txtdue to CUDA/driver variability)
- Create environment and install dependencies
python -m venv .venv && source .venv/bin/activate # or conda
pip install -r requirements.txt- Install vLLM (follow official instructions for your platform/CUDA)
- (Optional) Set API keys for live retrieval (place a
.envfile inpipeline/or export as env vars)
SEMANTIC_SCHOLAR_API_KEY=...
PUBMED_API_KEY=...
If you do not set keys, set retrievers to only "local" in the config.
- Prepare a config
cd pipeline
cp configs/configs_template.json configs/configs.json
# Edit configs/configs.json → update absolute paths: "keyword_file", "local_index_path", "feedback_dir".
# Optionally set models/GPUs under "services" and the stage "service_map".- (Optional) Build a local embedding index for textbook pages
Place your source pages under pipeline/data/ocr_batches/ (or adjust the script), then run:
python utils/embed.pyThis writes an embedded page file (e.g., pipeline/data/embedded_pages_*.json). Point local_index_path at this file.
- Run the pipeline
# from pipeline/
python async_main.py --config configs/configs.jsonUseful flags:
--debugfor verbose logs--case_id <ID>to process a single case from the keyword file
- Explore the UI
# from pipeline/
python app.pyDemo mode is available via --demo_mode, but requires a local demo results file; see app.py for path details.
You need two inputs: (1) a keyword file per case, and (2) a local embedding index of textbook pages.
- Keyword generation (optional)
The high‑throughput keyword generator uses a local vLLM server.
# from pipeline/
python utils/keywords_generator.py \
--input data/keywords_sample/keyword_generator_input_sample.jsonl \
--output data/keywords_sample/keyword_generator_output_sample.jsonl \
--prompts configs/keywords_generator_prompt.jsonEach output line will include _raw_response and a keywords list. Use this file as keyword_file in your config.
- Local vector index
Run python utils/embed.py (see Quickstart) to build an index from your pages and set local_index_path accordingly. If you only want to test the pipeline structure, you can temporarily set retrievers to ["local"] and use a small index.
- Dataset: https://huggingface.co/datasets/yale-nlp/MedTutor
- Composition (per case):
- Inputs:
case_id,reviewer_report,keywords - Retrieval:
reranked_papers_per_keyword(title, abstract, url, source, rerank_score),local_pages_per_keyword(index, text) - Generated:
generated_textbook_summaries(per keyword),generated_final_feedback,generated_mcqs
- Inputs:
hf_embedding_model/embedding_model_device: Model/device for query embeddings (CPU works for small jobs).retrievers: Enable sources among"local","pubmed","semantic".keyword_file,local_index_path: Absolute or project‑relative paths to your data.services: Define model workers (e.g., reranker and generator) withgpu_id,tensor_parallel_size, and per‑stage generation params.service_map: Map stages (reranker,textbook_summary,final_report,mcq_generation) to a service name.system: System personas and user templates per stage.
See pipeline/configs/configs_all.json for a larger 70B example and pipeline/configs/gradio_config.json for the UI defaults.
The pipeline writes a timestamped JSON under feedback_dir, including the resolved config and a list of processed cases. Each case includes original inputs, retrieved/reranked evidence, intermediate textbook summaries, final feedback text, and MCQs.
- LLM‑as‑a‑Judge (automated, multi‑provider): see pipeline docs at pipeline/llm_annotation/README.md
- Human annotation UI (Streamlit): see pipeline/manual_annotation/README.md
- Determinism across large LLMs can vary by hardware/driver. We recommend pinning configs and logging seeds and model versions in your runs. The pipeline logs runtime config and parameters alongside outputs.
- Repository license: Open Data Commons Attribution (ODC-By) v1.0. See
LICENSEand the official terms at https://opendatacommons.org/licenses/by/1-0/ - Third-party datasets and materials (e.g., MIMIC, CheXpert, PubMed/S2 content) remain under their original licenses and terms. You are responsible for complying with those licenses and any applicable data use agreements.
If you reference this research or use the released datasets, please cite:
@inproceedings{jang-etal-2025-medtutor,
title = "{M}ed{T}utor: A Retrieval-Augmented {LLM} System for Case-Based Medical Education",
author = "Jang, Dongsuk and
Shangguan, Ziyao and
Tegtmeyer, Kyle and
Gupta, Anurag and
Czerminski, Jan T and
Chheang, Sophie and
Cohan, Arman",
editor = {Habernal, Ivan and
Schulam, Peter and
Tiedemann, J{\"o}rg},
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.emnlp-demos.24/",
pages = "319--353",
ISBN = "979-8-89176-334-0"For questions or issues, please open a GitHub issue or email to [email protected]