This work presents UI-Genie, a self-improving framework that enhances MLLM-based GUI Agents through iterative agent-reward model co-evolution, achieving state-of-the-art performance without manual annotation.
[📖 Paper] [🤗 Models & Datasets ]
UI-Genie introduces a novel self-improving framework for GUI agents that:
- 🎯 Eliminates manual annotation through iterative synthetic trajectory generation
- 🔄 Co-evolves agent and reward models through self-improvement cycles
- 📊 Generates high-quality datasets without human effort
- 🏆 Achieves SOTA performance across multiple benchmarks
- UI-Genie-RM: First specialized reward model for GUI trajectory assessment with image-text interleaved architecture
- Self-Improvement Pipeline: Progressive expansion of solvable GUI tasks through reward-guided exploration
- Synthetic Data Generation: High-quality trajectory synthesis with outcome verification
| Model | Size | AndroidControl-Low | AndroidControl-High | AndroidLab | Android Arena | Download |
|---|---|---|---|---|---|---|
| SR | SR | SR | SR | |||
| UI-Genie-Agent | 3B | 93.8 | 72.9 | 28.8 | - | 🤗 HuggingFace |
| UI-Genie-Agent | 7B | 94.3 | 74.2 | 38.7 | 20.4 | 🤗 HuggingFace |
| UI-Genie-Agent | 72B | 94.8 | 77.0 | 41.2 | - | Coming soon |
| Model | Size | Step-Level F1 | Outcome-Level F1 |
|---|---|---|---|
| UI-Genie-RM | 7B | 79.6 | 82.1 |
We release two novel datasets that enable training GUI agents without manual annotation:
| Dataset | Size | Description | Link |
|---|---|---|---|
| UI-Genie-RM-517k | 517K | First reward dataset for GUI agents | 🤗 HuggingFace |
| UI-Genie-Agent-16k | 16K | High-quality synthetic trajectories | 🤗 HuggingFace |
- Clone this repository:
git clone https://github.com/Euphoria16/UI-Genie.git
cd UI-Genie- Create conda environment:
conda create -n ui-genie python=3.10.12 -y
conda activate ui-genie- Install dependencies:
cd src/ms-swift
pip install -e .Before running evaluations, you need to download the source images from AndroidControl:
# Download AndroidControl images and place them in the correct directory
# Place images under: src/ms-swift/data/androidcontrol/imgs/We provide evaluation scripts using the ms-swift library with pre-configured JSONL files located in src/ms-swift/data/.
Evaluate agent performance on high-level tasks that multi-step execution:
cd src/ms-swift
bash exps/eval_androidcontrol_swift_high_level.shEvaluate agent performance on low-level tasks with step instructions:
cd src/ms-swift
bash exps/eval_androidcontrol_swift_low_level.shAdditional evaluation scripts for AndroidLab and Android Arena benchmarks will be released soon.
We train UI-Genie agents based on the Qwen2.5-VL model family with the ms-swift framework for supervised fine-tuning.
Our training pipeline combines multiple datasets:
- AndroidControl training set
- AMEX training set
- AndroidLab training set
- UI-Genie-Agent-16k
Train the 3B model with full parameter fine-tuning:
cd src/ms-swift
bash exps/train_agent_3B.shTrain the 7B model with full parameter fine-tuning:
cd src/ms-swift
bash exps/train_agent_7B.shTrain the 72B model using RSLoRA for peft:
cd src/ms-swift
bash exps/train_agent_72B.shWe thank the teams behind Qwen2.5-VL, AndroidControl, and AndroidLab for their foundational work and ms-swift for the efficient training and inference framework.
For questions and feedback, please open an issue or contact:
- Han Xiao: [email protected]
- Aojun Zhou: [email protected]
This project is released under the MIT License.