Thanks to visit codestin.com
Credit goes to github.com

Skip to content

univanxx/3mdbench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark

arXiv

3MDBench, or Medical Multimodal Multi-agent Dialogue Benchmark, is an open-source benchmark for evaluating large vision-language models (LVLMs) through simulated doctor-patient dialogues. It features a Doctor Agent interacting with a temperament-driven Patient Agent using images and structured complaints. After that, an Assessor Agent, aligned with human experts, evaluates diagnostic and communication quality.

Preview

Updates

[2025-08-20] 3MDBench has been accepted to the EMNLP 2025 Main Conference 🎉

How to run estimation

Dependencies preparing


Dialogues generation

  • Go to the scripts folder;
  • Run run_dialogue.sh, choosing models from used in the paper or implementing custom in the agents/doctor_agent.py file;

Dialogues assessment

  • Run run_assessment.sh to estimate generated dialogue, which will be contained in the results/assessments folder;
  • Run run_diagnoses_obtaining.shto extract final diagnoses by Doctor Agent for each case, which will be contained in the results/assessments/diags folder;
  • Explore benchmarking/count_metrics.ipynb to analyze model's metrics.

BibTeX reference

@misc{sviridov20253mdbenchmedicalmultimodalmultiagent,
      title={3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark}, 
      author={Ivan Sviridov and Amina Miftakhova and Artemiy Tereshchenko and Galina Zubkova and Pavel Blinov and Andrey Savchenko},
      year={2025},
      eprint={2504.13861},
      archivePrefix={arXiv},
      primaryClass={cs.HC},
      url={https://arxiv.org/abs/2504.13861}, 
}

We appreciate your interest in our work! If you have any questions, please open an issue or contact Ivan at [email protected] or Amina at [email protected].

About

3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published