👏 Welcome to the Awesome-Agentic-MLLMs repository! This curated collection features influential papers, codebases, datasets, benchmarks, and resources dedicated to exploring the emerging field of agentic capabilities in Multimodal Large Language Models.
⭐ Feel free to star and fork this repository to stay updated with the latest advancements and contribute to the growing community.
We greatly appreciate and welcome everyone to submit an issue for any related work we may have missed, and we’ll review and address it in the next release!
-
Oct 14, 2025.We’re excited to introduce our survey paper on agentic MLLMs. Check it out on arXiv! -
Oct 12, 2025.This repository curates and maintains an updated list of papers on Awesome-Agentic-MLLM. Contributions and suggestions are warmly welcome!
If you find this survey helpful, please cite our work:
@article{yao2025survey,
title={A Survey on Agentic Multimodal Large Language Models},
author={Yao, Huanjin and Zhang, Ruifei and Huang, Jiaxing and Zhang, Jingyi and Wang, Yibo and Fang, Bo and Zhu, Ruolin and Jing, Yongcheng and Liu, Shunyu and Li, Guanbin and others},
journal={arXiv preprint arXiv:2510.10991},
year={2025}
}We collect recent advances in Agentic MLLMs and categorize them into three core dimensions: (1) Agentic Internal Intelligence, which leverages reasoning, reflection, and memory to enable accurate long-horizon planning; (2) Agentic External Tool Invocation, whereby models proactively use various external tools to extend their problem-solving capabilities beyond their intrinsic knowledge; and (3) Agentic environment interaction, which situates models within virtual or physical environments, allowing them to perceive changes and incorporate feedback from the real world.
- 🔔 News
- 🔗 Citation
- 🌍 Overview
- 📒 Table of Contents
- 📄 Paper List
| Title | Code |
|---|---|
| LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models | |
| ms-swift: SWIFT (Scalable lightWeight Infrastructure for Fine-Tuning) | |
| Megatron-LM | |
| Unsloth |