MDK12-Bench

🏗️ Quickstart | 📊 Datasets | 🏆 Leaderboard | 📝 Report | 🖊️ Citation

This repository is the official implementation of MDK12-Bench.

MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models
Pengfei Zhou*, Fanrui Zhang*, Xiaopeng Peng*, Zhaopan Xu, Jiaxin Ai, Yansheng Qiu, Chuanhao Li, Zhen Li, Ming Li, Yukang Feng, Jianwen Sun, Haoquan Zhang, Zizhen Li, Xiaofeng Mao, Zekai Li, Wangbo Zhao, Kai Wang, Xiaojun Chang, Wenqi Shao, Yang You†, Kaipeng Zhang†
^* Equal Contribution
^† Corresponding Author

🆕 News

[2025-04-09] The technical report of MDK12-Bench is released!

📖 Introduction

MDK12-Bench is a comprehensive benchmark designed to evaluate the reasoning capabilities of Multimodal Large Language Models (MLLMs) across multiple disciplines. Our benchmark covers a diverse range of tasks that require high-level reasoning abilities, providing a robust platform for challenging and assessing state-of-the-art MLLMs. MDK12-Bench aims to push the boundaries of multimodal intelligence by offering standardized evaluation metrics and high-quality test cases that reflect real-world reasoning scenarios.

🏗️ Quick Start

Please refer to MDK12EvalHub for your quick start.

After setting up the environment following instructions in Handbook, you may refer to quick-inference-qwenvl.py for a very quick inference using vLLM project. Then you may directly refer to judge.sh to use our judge logic for evaluating the inference results and obtaining the final performance score. Make sure the API for judge model is set in MDK12EvalHub/.env before you run the judge.sh. Finally, you can use count_all_acc_per_disc.py to summarize the performance on one subset.

📊 Datasets

MDK12mini-easy.zip, MDK12mini-medium.zip, MDK12mini-hard.zip are all well-cleaned data of the MDK12mini set.

🏆 Leaderboard

The performance score of accuracy on our official leaderboards can be briefly previewed here!

🖊️ Citation

If you feel MDK12-Bench useful in your project or research, please kindly use the following BibTeX entry to cite our paper. Thanks!

@misc{zhou2025mdk12,
      title={MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models}, 
      author={Pengfei Zhou and Fanrui Zhang and Xiaopeng Peng and Zhaopan Xu and Jiaxin Ai and Yansheng Qiu and Chuanhao Li and Zhen Li and Ming Li and Yukang Feng and Jianwen Sun and Haoquan Zhang and Zizhen Li and Xiaofeng Mao and Wangbo Zhao and Kai Wang and Xiaojun Chang and Wenqi Shao and Yang You and Kaipeng Zhang},
      year={2025},
      eprint={2504.05782},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.05782}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MDK12-Bench

🆕 News

📖 Introduction

🏗️ Quick Start

📊 Datasets

🏆 Leaderboard

🖊️ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
MDK12EvalHub		MDK12EvalHub
.DS_Store		.DS_Store
MDK12mini-easy.zip		MDK12mini-easy.zip
MDK12mini-hard.zip		MDK12mini-hard.zip
MDK12mini-medium.zip		MDK12mini-medium.zip
README.md		README.md
count_all_acc_per_disc.py		count_all_acc_per_disc.py
judge.sh		judge.sh
quick-inference-qwenvl.py		quick-inference-qwenvl.py

LanceZPF/MDK12

Folders and files

Latest commit

History

Repository files navigation

MDK12-Bench

🆕 News

📖 Introduction

🏗️ Quick Start

📊 Datasets

🏆 Leaderboard

🖊️ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages