YOLO-MASTER

YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection.

Xu Lin^1* Jinlong Peng^1* Zhenye Gan¹ Jiawen Zhu² Jun Liu¹

¹Tencent Youtu Lab ²Singapore Management University

^*Equal Contribution

{gatilin, jeromepeng, wingzygan, juliusliu}@tencent.com
[email protected]

English | 简体中文

💡 A Humble Beginning (Introduction)

"Exploring the frontiers of Dynamic Intelligence in YOLO."

This work represents our passionate exploration into the evolution of Real-Time Object Detection (RTOD). To the best of our knowledge, YOLO-Master is the first work to deeply integrate Mixture-of-Experts (MoE) with the YOLO architecture on general-purpose datasets.

Most existing YOLO models rely on static, dense computation—allocating the same computational budget to a simple sky background as they do to a complex, crowded intersection. We believe detection models should be more "adaptive", much like the human visual system. While this initial exploration may be not perfect, it demonstrates the significant potential of Efficient Sparse MoE (ES-MoE) in balancing high precision with ultra-low latency. We are committed to continuous iteration and optimization to refine this approach further.

Looking forward, we draw inspiration from the transformative advancements in LLMs and VLMs. We are committed to refining this approach and extending these insights to fundamental vision tasks, with the ultimate goal of tackling more ambitious frontiers like Open-Vocabulary Detection and Open-Set Segmentation.

Abstract

Existing Real-Time Object Detection (RTOD) methods commonly adopt YOLO-like architectures for their favorable trade-off between accuracy and speed. However, these models rely on static dense computation that applies uniform processing to all inputs, misallocating representational capacity and computational resources such as over-allocating on trivial scenes while under-serving complex ones. This mismatch results in both computational redundancy and suboptimal detection performance.

To overcome this limitation, we propose YOLO-Master, a novel YOLO-like framework that introduces instance-conditional adaptive computation for RTOD. This is achieved through an Efficient Sparse Mixture-of-Experts (ES-MoE) block that dynamically allocates computational resources to each input according to its scene complexity. At its core, a lightweight dynamic routing network guides expert specialization during training through a diversity enhancing objective, encouraging complementary expertise among experts. Additionally, the routing network adaptively learns to activate only the most relevant experts, thereby improving detection performance while minimizing computational overhead during inference.

Comprehensive experiments on five large-scale benchmarks demonstrate the superiority of YOLO-Master. On MS COCO, our model achieves 42.4% AP with 1.62ms latency, outperforming YOLOv13-N by +0.8% mAP and 17.8% faster inference. Notably, the gains are most pronounced on challenging dense scenes, while the model preserves efficiency on typical inputs and maintains real-time inference speed. Code: isLinXu/YOLO-Master

🎨 Architecture

YOLO-Master introduces ES-MoE blocks to achieve "compute-on-demand" via dynamic routing.

📚 In-Depth Documentation

For a deep dive into the design philosophy of MoE modules, detailed routing mechanisms, and optimization guides for deployment on various hardware (GPU/CPU/NPU), please refer to our Wiki: 👉 Wiki: MoE Modules Explained

📖 Table of Contents

🚀 Updates (Latest First)

2025/12/31: Released the demo YOLO-Master-WebUI-Demo.
2025/12/31: Released YOLO-Master v0.1 with code, pre-trained weights, and documentation.
2025/12/30: arXiv paper published.

📊 Main Results

Detection

Radar chart comparing YOLO models on various datasets

Table 1. Comparison with state-of-the-art Nano-scale detectors across five benchmarks.

Dataset	COCO		PASCAL VOC		VisDrone		KITTI		SKU-110K		Efficiency
Method	mAP (%)	mAP₅₀ (%)	mAP (%)	mAP₅₀ (%)	mAP (%)	mAP₅₀ (%)	mAP (%)	mAP₅₀ (%)	mAP (%)	mAP₅₀ (%)	Latency (ms)
YOLOv10	38.5	53.8	60.6	80.3	18.7	32.4	66.0	88.3	57.4	90.0	1.84
YOLOv11-N	39.4	55.3	61.0	81.2	18.5	32.2	67.8	89.8	57.4	90.0	1.50
YOLOv12-N	40.6	56.7	60.7	80.8	18.3	31.7	67.6	89.3	57.4	90.0	1.64
YOLOv13-N	41.6	57.8	60.7	80.3	17.5	30.6	67.7	90.6	57.5	90.3	1.97
YOLO-Master-N	42.4	59.2	62.1	81.9	19.6	33.7	69.2	91.3	58.2	90.6	1.62

Segmentation

Model	Size	mAPbox (%)	mAPmask (%)	Gain (mAPmask)
YOLOv11-seg-N	640	38.9	32.0	-
YOLOv12-seg-N	640	39.9	32.8	Baseline
YOLO-Master-seg-N	640	42.9	35.6	+2.8% 🚀

Classification

Model	Dataset	Input Size	Top-1 Acc (%)	Top-5 Acc (%)	Comparison
YOLOv11-cls-N	ImageNet	224	70.0	89.4	Baseline
YOLOv12-cls-N	ImageNet	224	71.7	90.5	+1.7% Top-1
YOLO-Master-cls-N	ImageNet	224	76.6	93.4	+4.9% Top-1 🔥

🖼️ Detection Examples

Detection
Segmentation

🧩 Supported Tasks

YOLO-Master builds upon the robust Ultralytics framework, inheriting support for various computer vision tasks. While our research primarily focuses on Real-Time Object Detection, the codebase is capable of supporting:

Task	Status	Description
Object Detection	✅	Real-time object detection with ES-MoE acceleration.
Instance Segmentation	✅	Experimental support (inherited from Ultralytics).
Pose Estimation	🚧	Experimental support (inherited from Ultralytics).
OBB Detection	🚧	Experimental support (inherited from Ultralytics).
Classification	✅	Image classification support.

⚙️ Quick Start

Installation

Install via pip (Recommended)

# 1. Create and activate a new environment
conda create -n yolo_master python=3.11 -y
conda activate yolo_master

# 2. Clone the repository
git clone https://github.com/isLinXu/YOLO-Master
cd YOLO-Master

# 3. Install dependencies
pip install -r requirements.txt
pip install -e .

# 4. Optional: Install FlashAttention for faster training (CUDA required)
pip install flash_attn

Validation

Validate the model accuracy on the COCO dataset.

from ultralytics import YOLO

# Load the pretrained model
model = YOLO("yolo_master_n.pt") 

# Run validation
metrics = model.val(data="coco.yaml", save_json=True)
print(metrics.box.map)  # map50-95

Training

Train a new model on your custom dataset or COCO.

from ultralytics import YOLO

# Load a model
model = YOLO('cfg/models/master/v0/det/yolo-master-n.yaml')  # build a new model from YAML

# Train the model
results = model.train(
    data='coco.yaml',
    epochs=600, 
    batch=256, 
    imgsz=640,
    device="0,1,2,3", # Use multiple GPUs
    scale=0.5, 
    mosaic=1.0,
    mixup=0.0, 
    copy_paste=0.1
)

Inference

Run inference on images or videos.

Python:

from ultralytics import YOLO

model = YOLO("yolo_master_n.pt")
results = model("path/to/image.jpg")
results[0].show()

CLI:

yolo predict model=yolo_master_n.pt source='path/to/image.jpg' show=True

Export

Export the model to other formats for deployment (TensorRT, ONNX, etc.).

from ultralytics import YOLO

model = YOLO("yolo_master_n.pt")
model.export(format="engine", half=True)  # Export to TensorRT
# formats: onnx, openvino, engine, coreml, saved_model, pb, tflite, edgetpu, tfjs

Gradio Demo

Launch a local web interface to test the model interactively. This application provides a user-friendly Gradio dashboard for model inference, supporting automatic model scanning, task switching (Detection, Segmentation, Classification), and real-time visualization.

python app.py
# Open http://127.0.0.1:7860 in your browser

🤝 Community & Contributing

We welcome contributions! Please check out our Contribution Guidelines for details on how to get involved.

Issues: Report bugs or request features here.
Pull Requests: Submit your improvements.

📄 License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).

🙏 Acknowledgements

This work builds upon the excellent Ultralytics framework. Huge thanks to the community for contributions, deployments, and tutorials!

📝 Citation

If you use YOLO-Master in your research, please cite our paper:

@article{lin2025yolomaster,
  title={{YOLO-Master}: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection},
  author={Lin, Xu and Peng, Jinlong and Gan, Zhenye and Zhu, Jiawen and Liu, Jun},
  journal={arXiv preprint arXiv:},
  year={2025}
}

⭐ If you find this work useful, please star the repository!

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
docker		docker
docs		docs
examples		examples
scripts		scripts
tests		tests
ultralytics		ultralytics
wiki		wiki
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
app.py		app.py
index.html		index.html
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

YOLO-MASTER

💡 A Humble Beginning (Introduction)

🎨 Architecture

📚 In-Depth Documentation

📖 Table of Contents

🚀 Updates (Latest First)

📊 Main Results

Detection

Segmentation

Classification

🖼️ Detection Examples

🧩 Supported Tasks

⚙️ Quick Start

Installation

Validation

Training

Inference

Export

Gradio Demo

🤝 Community & Contributing

📄 License

🙏 Acknowledgements

📝 Citation

About

Uh oh!

Releases

Packages

Languages

License

isLinXu/YOLO-Master

Folders and files

Latest commit

History

Repository files navigation

YOLO-MASTER

💡 A Humble Beginning (Introduction)

🎨 Architecture

📚 In-Depth Documentation

📖 Table of Contents

🚀 Updates (Latest First)

📊 Main Results

Detection

Segmentation

Classification

🖼️ Detection Examples

🧩 Supported Tasks

⚙️ Quick Start

Installation

Validation

Training

Inference

Export

Gradio Demo

🤝 Community & Contributing

📄 License

🙏 Acknowledgements

📝 Citation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages