Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[NeurIPS 2025] UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

Notifications You must be signed in to change notification settings

Euphoria16/UI-Genie

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🧞 UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

This work presents UI-Genie, a self-improving framework that enhances MLLM-based GUI Agents through iterative agent-reward model co-evolution, achieving state-of-the-art performance without manual annotation.

[📖 Paper] [🤗 Models & Datasets ]

👀 Overview

UI-Genie introduces a novel self-improving framework for GUI agents that:

  • 🎯 Eliminates manual annotation through iterative synthetic trajectory generation
  • 🔄 Co-evolves agent and reward models through self-improvement cycles
  • 📊 Generates high-quality datasets without human effort
  • 🏆 Achieves SOTA performance across multiple benchmarks

🌟 Key Features

  • UI-Genie-RM: First specialized reward model for GUI trajectory assessment with image-text interleaved architecture
  • Self-Improvement Pipeline: Progressive expansion of solvable GUI tasks through reward-guided exploration
  • Synthetic Data Generation: High-quality trajectory synthesis with outcome verification

🤖 Model Zoo

Released Models

Model Size AndroidControl-Low AndroidControl-High AndroidLab Android Arena Download
SR SR SR SR
UI-Genie-Agent 3B 93.8 72.9 28.8 - 🤗 HuggingFace
UI-Genie-Agent 7B 94.3 74.2 38.7 20.4 🤗 HuggingFace
UI-Genie-Agent 72B 94.8 77.0 41.2 - Coming soon

Reward Model

Model Size Step-Level F1 Outcome-Level F1
UI-Genie-RM 7B 79.6 82.1

📊 Datasets

We release two novel datasets that enable training GUI agents without manual annotation:

Dataset Size Description Link
UI-Genie-RM-517k 517K First reward dataset for GUI agents 🤗 HuggingFace
UI-Genie-Agent-16k 16K High-quality synthetic trajectories 🤗 HuggingFace

🛠️ Installation

  1. Clone this repository:
git clone https://github.com/Euphoria16/UI-Genie.git
cd UI-Genie
  1. Create conda environment:
conda create -n ui-genie python=3.10.12 -y
conda activate ui-genie
  1. Install dependencies:
cd src/ms-swift
pip install -e .

📈 Evaluation

Prerequisites

Before running evaluations, you need to download the source images from AndroidControl:

# Download AndroidControl images and place them in the correct directory
# Place images under: src/ms-swift/data/androidcontrol/imgs/

AndroidControl Benchmark

We provide evaluation scripts using the ms-swift library with pre-configured JSONL files located in src/ms-swift/data/.

High-Level Task Evaluation

Evaluate agent performance on high-level tasks that multi-step execution:

cd src/ms-swift
bash exps/eval_androidcontrol_swift_high_level.sh

Low-Level Task Evaluation

Evaluate agent performance on low-level tasks with step instructions:

cd src/ms-swift
bash exps/eval_androidcontrol_swift_low_level.sh

Other Benchmarks

Additional evaluation scripts for AndroidLab and Android Arena benchmarks will be released soon.

🔥 Training

We train UI-Genie agents based on the Qwen2.5-VL model family with the ms-swift framework for supervised fine-tuning.

Training Data

Our training pipeline combines multiple datasets:

Training Scripts

UI-Genie-Agent-3B (Full Fine-tuning)

Train the 3B model with full parameter fine-tuning:

cd src/ms-swift
bash exps/train_agent_3B.sh

UI-Genie-Agent-7B (Full Fine-tuning)

Train the 7B model with full parameter fine-tuning:

cd src/ms-swift
bash exps/train_agent_7B.sh

UI-Genie-Agent-72B (Parameter-Efficient Fine-tuning)

Train the 72B model using RSLoRA for peft:

cd src/ms-swift
bash exps/train_agent_72B.sh

🤝 Acknowledgements

We thank the teams behind Qwen2.5-VL, AndroidControl, and AndroidLab for their foundational work and ms-swift for the efficient training and inference framework.

📧 Contact

For questions and feedback, please open an issue or contact:

📄 License

This project is released under the MIT License.

About

[NeurIPS 2025] UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages