🧞 UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

This work presents UI-Genie, a self-improving framework that enhances MLLM-based GUI Agents through iterative agent-reward model co-evolution, achieving state-of-the-art performance without manual annotation.

[📖 Paper] [🤗 Models & Datasets ]

👀 Overview

UI-Genie introduces a novel self-improving framework for GUI agents that:

🎯 Eliminates manual annotation through iterative synthetic trajectory generation
🔄 Co-evolves agent and reward models through self-improvement cycles
📊 Generates high-quality datasets without human effort
🏆 Achieves SOTA performance across multiple benchmarks

🌟 Key Features

UI-Genie-RM: First specialized reward model for GUI trajectory assessment with image-text interleaved architecture
Self-Improvement Pipeline: Progressive expansion of solvable GUI tasks through reward-guided exploration
Synthetic Data Generation: High-quality trajectory synthesis with outcome verification

🤖 Model Zoo

Released Models

Model	Size	AndroidControl-Low	AndroidControl-High	AndroidLab	Android Arena	Download
		SR	SR	SR	SR
UI-Genie-Agent	3B	93.8	72.9	28.8	-	🤗 HuggingFace
UI-Genie-Agent	7B	94.3	74.2	38.7	20.4	🤗 HuggingFace
UI-Genie-Agent	72B	94.8	77.0	41.2	-	Coming soon

Reward Model

Model	Size	Step-Level F1	Outcome-Level F1
UI-Genie-RM	7B	79.6	82.1

📊 Datasets

We release two novel datasets that enable training GUI agents without manual annotation:

Dataset	Size	Description	Link
UI-Genie-RM-517k	517K	First reward dataset for GUI agents	🤗 HuggingFace
UI-Genie-Agent-16k	16K	High-quality synthetic trajectories	🤗 HuggingFace

🛠️ Installation

Clone this repository:

git clone https://github.com/Euphoria16/UI-Genie.git
cd UI-Genie

Create conda environment:

conda create -n ui-genie python=3.10.12 -y
conda activate ui-genie

Install dependencies:

cd src/ms-swift
pip install -e .

📈 Evaluation

Prerequisites

Before running evaluations, you need to download the source images from AndroidControl:

# Download AndroidControl images and place them in the correct directory
# Place images under: src/ms-swift/data/androidcontrol/imgs/

AndroidControl Benchmark

We provide evaluation scripts using the ms-swift library with pre-configured JSONL files located in src/ms-swift/data/.

High-Level Task Evaluation

Evaluate agent performance on high-level tasks that multi-step execution:

cd src/ms-swift
bash exps/eval_androidcontrol_swift_high_level.sh

Low-Level Task Evaluation

Evaluate agent performance on low-level tasks with step instructions:

cd src/ms-swift
bash exps/eval_androidcontrol_swift_low_level.sh

Other Benchmarks

Additional evaluation scripts for AndroidLab and Android Arena benchmarks will be released soon.

🔥 Training

We train UI-Genie agents based on the Qwen2.5-VL model family with the ms-swift framework for supervised fine-tuning.

Training Data

Our training pipeline combines multiple datasets:

AndroidControl training set
AMEX training set
AndroidLab training set
UI-Genie-Agent-16k

Training Scripts

UI-Genie-Agent-3B (Full Fine-tuning)

Train the 3B model with full parameter fine-tuning:

cd src/ms-swift
bash exps/train_agent_3B.sh

UI-Genie-Agent-7B (Full Fine-tuning)

Train the 7B model with full parameter fine-tuning:

cd src/ms-swift
bash exps/train_agent_7B.sh

UI-Genie-Agent-72B (Parameter-Efficient Fine-tuning)

Train the 72B model using RSLoRA for peft:

cd src/ms-swift
bash exps/train_agent_72B.sh

🤝 Acknowledgements

We thank the teams behind Qwen2.5-VL, AndroidControl, and AndroidLab for their foundational work and ms-swift for the efficient training and inference framework.

📧 Contact

For questions and feedback, please open an issue or contact:

Han Xiao: [email protected]
Aojun Zhou: [email protected]

📄 License

This project is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
src		src
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧞 UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

👀 Overview

🌟 Key Features

🤖 Model Zoo

Released Models

Reward Model

📊 Datasets

🛠️ Installation

📈 Evaluation

Prerequisites

AndroidControl Benchmark

High-Level Task Evaluation

Low-Level Task Evaluation

Other Benchmarks

🔥 Training

Training Data

Training Scripts

UI-Genie-Agent-3B (Full Fine-tuning)

UI-Genie-Agent-7B (Full Fine-tuning)

UI-Genie-Agent-72B (Parameter-Efficient Fine-tuning)

🤝 Acknowledgements

📧 Contact

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Euphoria16/UI-Genie

Folders and files

Latest commit

History

Repository files navigation

🧞 UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

👀 Overview

🌟 Key Features

🤖 Model Zoo

Released Models

Reward Model

📊 Datasets

🛠️ Installation

📈 Evaluation

Prerequisites

AndroidControl Benchmark

High-Level Task Evaluation

Low-Level Task Evaluation

Other Benchmarks

🔥 Training

Training Data

Training Scripts

UI-Genie-Agent-3B (Full Fine-tuning)

UI-Genie-Agent-7B (Full Fine-tuning)

UI-Genie-Agent-72B (Parameter-Efficient Fine-tuning)

🤝 Acknowledgements

📧 Contact

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages