Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools
- 🧠 Oct 28, 2025 — MCP-Flow is released on arXiv.
- 🛠️ Nov 10, 2025 — We open-source all server configurations and tool information!
MCP-Flow is an automated web-agent-driven pipeline for large-scale server discovery, data synthesis, and model training in the Model Context Protocol (MCP) ecosystem.
-
🤖 Automated server collection from 6 major MCP marketplaces
-
📊 Extensive tool coverage: 1,166 real-world servers, 11,536 tools, and 68K+ instruction–function call pairs
-
🧩 Scale & diversity far beyond previous benchmarks
| Category | Path | Description |
|---|---|---|
| 🧠 Function calls & trajectories | ./data/function_call/ & ./data/trajectory/ |
Example data; full datasets are released on HuggingFace |
| ⚙️ MCP configurations | ./data/mcp_config/ |
Configuration files for discovered servers |
| 🧰 Tool information | ./data/tools/ |
Tool descriptions and schema definitions |
| 💻 Source code | ./src/ |
Core scripts for server deployment |
git clone https://github.com/<your-org>/MCP-Flow.git
cd MCP-Flow
pip install -r requirements.txtIf you find MCP-Flow useful in your research, please consider citing:
@misc{wang2025mcpflowfacilitatingllmagents,
title={MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools},
author={Wenhao Wang and Peizhi Niu and Zhao Xu and Zhaoyu Chen and Jian Du and Yaxin Du and Xianghe Pang and Keduan Huang and Yanfeng Wang and Qiang Yan and Siheng Chen},
year={2025},
eprint={2510.24284},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2510.24284},
}If you have any questions or encounter issues, feel free to open an issue or reach out to the authors directly:
📮 Email: [email protected]
💬 WeChat: