👽 V-Droid: Advancing Mobile GUI Agent Through Generative Verifiers

This repo provides the public preview for V-Droid(https://arxiv.org/abs/2503.15937), a verifier-driven mobile GUI agents. Unlike previous mobile agents that utilize Large Language Models (LLMs) as generators to directly generate actions at each step, V-Droid employs LLMs as verifiers to evaluate candidate actions before making final decisions. To realize this novel paradigm, we introduce a comprehensive framework for constructing verifier-driven mobile agents: the discretized action space construction coupled with the prefilling-only workflow to accelerate the verification process, the pair-wise progress preference training to significantly enhance the verifier's decision-making capabilities, and the scalable human-agent joint annotation scheme to efficiently collect the necessary data at scale. V-Droid sets a new state-of-the-art task success rate across several public mobile task automation benchmarks: 59.5% on AndroidWorld, 38.3% on AndroidLab, and 49% on MobileAgentBench, surpassing existing agents by 9.5%, 2.1%, and 9%, respectively. Furthermore, V-Droid achieves an impressively low latency of 0.7 seconds per step, making it the first mobile agent capable of delivering near-real-time, effective decision-making capabilities.

✅ Paper link: https://arxiv.org/abs/2503.15937
✅ Model weights: https://huggingface.co/V-Droid/V-Droid-8B-0323

Demos

V-Droid in the following demos are hosted on 2x4090 GPUs, the videos are presented without acceleration.

Delete the recipes from Broccoli app: Chicken Alfredo Pasta, Tomato Basil Bruschetta, Grilled Cheese with Tomato and Basil.	Swich on WiFi for me..	Send a text message to +16597910719 with message: Beauty is in the eye of the beholder.

V-Droid Workflow

In V-Droid, we propose the verifier-driven approach and the correpsonding workflow for GUI agents as follows:

Extracting actions from UI and supplementing default actions;
Constructing verification prompts with the template for each candidate action;
Scoring with the verifier in batch with prefix caching;
Completing and executing the selected action;
Updating the working memory. For more details, please refer our code

Quick Start

Setup AndroidWorld Environment
1. Download Android Studio here
2. Create an Android Virtual Device (AVD) by following these instructions. For hardware select Pixel 6, for System Image select Tiramisu, API Level 33, and choose AVD name as AndroidWorldAvd. Watch the setup video.

Launch the Android Emulator from the command line Launch the emulator from the command line, not using the Android Studio UI, with the -grpc 8554 flag which is needed communication with accessibility forwarding app.

# Typically it's located in ~/Android/Sdk/emulator/emulator or
# ~/Library/Android/sdk/emulator/emulator
EMULATOR_NAME=AndroidWorldAvd # From previous step
~/Library/Android/sdk/emulator/emulator -avd $EMULATOR_NAME -no-snapshot -grpc 8554

[Optional] It's recommended to use conda, which you can download here.

conda create -n android_world python=3.11.8
conda activate android_world
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
conda install -y numpy pandas

Install Dependency. Note: Python 3.11 or above is required.
```
pip install -r requirements.txt
```

Modify vLLM.

Please navigate to vllm/model_executor/layers/sampler.py, add the following to line 317.

for val, lst in zip(logits, sample_logprobs):
        for d in lst:
            for k in d.keys():
                d[k].logprob = val

(See vllm-project/vllm#11397 for more explanations)

Add model provider APIs as environment variables.

Three API providers are supported: OpenAI and its compatible APIs, and Azure OpenAI services. You may configure any of these based on your preferences.

These APIs are only used for building the working memory, V-Droid allows to build the working memory without using these third-party APIs

# Add to .bashrc.

# use Gemini GCP service, which requires API key
export GCP_API_KEY=

# use openai compatible APIs, including OPENAI, Qwen and DeepSeek
export OPENAI_ENDPOINT=
export OPENAI_MODEL_NAME=
export OPENAI_API_VERSION=
export OPENAI_API_KEY=


# use azure openai services
export AZURE_OPENAI_API_KEY=
export AZURE_OPENAI_MODEL_NAME=
export AZURE_OPENAI_API_VERSION=
export AZURE_OPENAI_ENDPOINT=

Download Lora weights for V-Droid model

The V-Droid model weight is available at https://huggingface.co/V-Droid/V-Droid-8B-0323

Lauanch the emulator and run the eveluation tasks

emulator -avd AndroidWorldAvd -no-window -no-snapshot -grpc 8554
bash main.sh

Training You may use the following code to train the lora module in V-Droid. We provide several training pairs to use.

train.sh

Citation

If you use this repo, please cite our paper:

@article{dai2025advancingmobileguiagents,
      title={Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment}, 
      author={Gaole Dai and Shiqi Jiang and Ting Cao and Yuanchun Li and Yuqing Yang and Rui Tan and Mo Li and Lili Qiu},
      year={2025},
      eprint={2503.15937},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2503.15937}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
MCTS		MCTS
android_world		android_world
datasets		datasets
html_representation		html_representation
train		train
.gitignore		.gitignore
README.md		README.md
main.sh		main.sh
prompt_template.py		prompt_template.py
requirements.txt		requirements.txt
run_suite.py		run_suite.py
train.sh		train.sh
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

👽 V-Droid: Advancing Mobile GUI Agent Through Generative Verifiers

Demos

V-Droid Workflow

Quick Start

Citation

About

Uh oh!

Releases

Contributors 2

Languages

V-Droid-Agent/V-Droid

Folders and files

Latest commit

History

Repository files navigation

👽 V-Droid: Advancing Mobile GUI Agent Through Generative Verifiers

Demos

V-Droid Workflow

Quick Start

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Contributors 2

Languages