MedChat

Philip Liu, Sparsh Bansal, Jimmy Dinh, Aditya Pawar, Ramani Satishkumar, Shail Desai, Neeraj Gupta, Xin Wang, Shu Hu

A modular framework that uses modern vision back-ends with role-specialised LLM agents to draft glaucoma diagnostic reports from retinal fundus images. The core idea is for each agent to focus on a narrow clinical role, while a Director agent synthesises their opinions into a concise, clinically-grounded report.

Method Overview

Vision Pre-processing
- Classifier (SwinV2) → glaucoma probability p (binned into “no glaucoma / possible glaucoma / likely glaucoma / glaucoma detected”).
- Segmentor (SegFormer) → optic-cup & disc masks → cup-to-disc ratio (CDR).
Core Prompt
Natural-language sentences summarise p and CDR, optionally appended with clinician notes.
Role Generation
A meta-prompt asks GPT-4.1 to list relevant clinical roles (e.g., Ophthalmologist, Optometrist, Pharmacist).
Role-Specialised Sub-Reports
Each role gets the core prompt plus narrow instructions and writes a focused sub-report.
Director Synthesis
Another GPT-4.1 instance combines all sub-reports, resolves minor conflicts, and produces one clean diagnostic report.
Output & Interface
The final report, probability, CDR, and sub-reports are returned to the client (e.g., MedChat front-end) for interactive Q&A and PDF export.

✨ Key Contributions

#	Contribution	Why it matters
1	Multi-agent reasoning: Ophthalmologist, Optometrist, Pharmacist, … plus a Director agent	Reduces hallucinations and reflects real-world collaboration
2	Tight CAD ⇄ LLM loop: SwinV2 classifier (glaucoma probability) + SegFormer segmentor (optic-cup/optic-disc masks)	Keeps language output anchored to verifiable image features (e.g., CDR)
3	MedChat interface (browser-based)	Enables interactive Q&A and PDF report download for clinicians and learners – see `frontend/` for code; no external link included

Repository Layout

├── backend/              # Python API, multi-agent pipeline, model wrappers
│   ├── cad/              #  SwinV2 classifier & SegFormer segmentor
│   ├── agents/           #  Role prompts, Director logic
│   └── api.py            #  FastAPI / Flask endpoints
├── frontend/             #  Lightweight JS + HTML MedChat client
└── README.md

MedChat Interface

The repo ships with a minimal browser client that:

uploads a fundus image + optional notes,
streams sub-reports in real time,
allows follow-up Q&A with full conversation memory, and
exports the complete conversation as a styled PDF report.

Citation

If you use this work, please cite:

@inproceedings{liu2025multiagent,
  title     = {Multi-Agent Diagnosis using Multimodal Large Language Models},
  author    = {Liu, Philip and Bansal, Sparsh and Dinh, Jimmy and Pawar, Aditya and Satishkumar, Ramani and Gupta, Neeraj and Wang, Xin and Hu, Shu},
  booktitle = {IEEE Int'l Conf. on Multimedia Information Processing and Retrieval (MIPR)},
  year      = {2025}
}

License

This project is released under the MIT License (see LICENSE).

Contact

For questions or collaboration requests, please open an issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MedChat

Method Overview

✨ Key Contributions

Repository Layout

MedChat Interface

Citation

License

Contact

About

Uh oh!

Releases

Packages

Contributors 6

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

Purdue-M2/MedChat

Folders and files

Latest commit

History

Repository files navigation

MedChat

Method Overview

✨ Key Contributions

Repository Layout

MedChat Interface

Citation

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Uh oh!

Languages

Packages