3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark

3MDBench, or Medical Multimodal Multi-agent Dialogue Benchmark, is an open-source benchmark for evaluating large vision-language models (LVLMs) through simulated doctor-patient dialogues. It features a Doctor Agent interacting with a temperament-driven Patient Agent using images and structured complaints. After that, an Assessor Agent, aligned with human experts, evaluates diagnostic and communication quality.

Updates

[2025-08-20] 3MDBench has been accepted to the EMNLP 2025 Main Conference 🎉

How to run estimation

Dependencies preparing

Install dependencies from requirements.txt
Check out the 3MDBench dataset on Hugging Face!

Dialogues generation

Go to the scripts folder;
Run run_dialogue.sh, choosing models from used in the paper or implementing custom in the agents/doctor_agent.py file;

Dialogues assessment

Run run_assessment.sh to estimate generated dialogue, which will be contained in the results/assessments folder;
Run run_diagnoses_obtaining.shto extract final diagnoses by Doctor Agent for each case, which will be contained in the results/assessments/diags folder;
Explore benchmarking/count_metrics.ipynb to analyze model's metrics.

BibTeX reference

@misc{sviridov20253mdbenchmedicalmultimodalmultiagent,
      title={3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark}, 
      author={Ivan Sviridov and Amina Miftakhova and Artemiy Tereshchenko and Galina Zubkova and Pavel Blinov and Andrey Savchenko},
      year={2025},
      eprint={2504.13861},
      archivePrefix={arXiv},
      primaryClass={cs.HC},
      url={https://arxiv.org/abs/2504.13861}, 
}

We appreciate your interest in our work! If you have any questions, please open an issue or contact Ivan at [email protected] or Amina at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
agents		agents
benchmarking		benchmarking
scripts		scripts
utils		utils
3mdbench.jpg		3mdbench.jpg
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark

Updates

How to run estimation

Dependencies preparing

Dialogues generation

Dialogues assessment

BibTeX reference

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

univanxx/3mdbench

Folders and files

Latest commit

History

Repository files navigation

3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark

Updates

How to run estimation

Dependencies preparing

Dialogues generation

Dialogues assessment

BibTeX reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages