📱 Vision‑Language Mobile Agent

This repo shows an end‑to‑end pipeline that

grabs a screenshot from an Android emulator running on your laptop,
sends it (along with a natural‑language task) to a remote FastAPI server that hosts a Vision‑Language Model (VLM),
gets back a proposed UI action (tap, swipe, type, …),
and executes that action on the emulator via adb.

The two main entry‑point files are:

file	role
`<<model>>_server.py`	FastAPI server that loads Qwen‑2.5-VL (or any other VLM) and returns an action JSON
`pav_client.py`	Laptop‑side script: captures screenshots, calls the server, and translates the JSON into `adb` commands

0. Prerequisites

Machine	Requirements
Server	• Linux with CUDA‑capable GPU (24 GB VRAM recommended) • Python ≥ 3.10 • `torch` + `transformers` • Hugging Face Access Token (because `Qwen2.5‑VL` is gated)
Laptop / Local PC	• Android Studio with a running AVD • `adb` available in `$PATH` • Python ≥ 3.10

1. Server setup (`qwen_server.py`,`uitars_server.py`,`pav_qwen_server.py`)

# 1‑A. Create & activate venv / conda env
conda create -n agent python=3.10 -y
conda activate agent

# 1‑B. Install deps
pip install -r requirements.txt

# 1‑C. (one‑time) login to Hugging Face – needed for Qwen‑VL
huggingface-cli login
# paste your HF access‑token (READ scope)

# 1-D. Few-shot Retrieval
# /PAV/server
python shot_composer.py --app_name google_maps --json_file goolge_map_pool.json
python shot_composer.py --app_name ali --json_file aliexpress_pool.json

# 1‑E. Launch the server
uvicorn <server file name>:app --host 0.0.0.0 --port 8000

If everything loads correctly you should see:

INFO:     Uvicorn running on http://0.0.0.0:8000
Model loaded successfully.

2. Local Client setup (`pav_client.py`)

# 2‑A. Create venv
conda create -n agent_cli python=3.10 -y
conda activate agent_cli

# 2‑B. Install deps
pip install requests pillow

# 2‑C. Make sure `adb` is in PATH

adb devices   # should list your AVD, e.g. emulator‑5554

# ADB path set
export ANDROID_HOME=$HOME/Library/Android/sdk
export PATH=$PATH:$ANDROID_HOME/platform-tools
source ~/.zshrc # or source ~/.bashrc

3. Usage

3-A. Start Android emulator

Turn on your Android studio and activate an emulator (e.g., Pixel9 API36)
Set location to the designated place (Seoul Ai Hub)
Login Google Account (ID:[email protected] / PW:samsung2025!)
For Google Maps task, set the starting point should be below image.
For AliExpress task, set the starting point should be below image.

You can set the starting point easily with Snapshot in Android Studio!!

3-B. Run the client script

Don't forget that the server (uvicorn <server file name>:app --host 0.0.0.0 --port 8000) is already on run

python client.py \
  --server http://<SERVER_IP>:8000/predict \
  --device_id emulator-5554 \
  --task "Please display the route to Gwanghwamun Square." \
  --image_path "qwen_3b_pav_google_screenshots/0" \
  --max_steps 10

Name		Name	Last commit message	Last commit date
Latest commit History 187 Commits
assets		assets
client		client
gpt4_pools		gpt4_pools
macro_graph		macro_graph
server		server
task		task
webjudge		webjudge
.gitignore		.gitignore
Readme.md		Readme.md
aliexpress_task_test.sh		aliexpress_task_test.sh
google_maps_task_test.sh		google_maps_task_test.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

📱 Vision‑Language Mobile Agent

0. Prerequisites

1. Server setup (`qwen_server.py`,`uitars_server.py`,`pav_qwen_server.py`)

2. Local Client setup (`pav_client.py`)

3. Usage

3-A. Start Android emulator

3-B. Run the client script

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Uh oh!

Uh oh!

junho328/PAV

Folders and files

Latest commit

History

Repository files navigation

📱 Vision‑Language Mobile Agent

0. Prerequisites

1. Server setup (qwen_server.py,uitars_server.py,pav_qwen_server.py)

2. Local Client setup (pav_client.py)

3. Usage

3-A. Start Android emulator

3-B. Run the client script

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

1. Server setup (`qwen_server.py`,`uitars_server.py`,`pav_qwen_server.py`)

2. Local Client setup (`pav_client.py`)

Packages