This repo shows an end‑to‑end pipeline that
- grabs a screenshot from an Android emulator running on your laptop,
- sends it (along with a natural‑language task) to a remote FastAPI server that hosts a Vision‑Language Model (VLM),
- gets back a proposed UI action (tap, swipe, type, …),
- and executes that action on the emulator via
adb.
The two main entry‑point files are:
| file | role |
|---|---|
<<model>>_server.py |
FastAPI server that loads Qwen‑2.5-VL (or any other VLM) and returns an action JSON |
pav_client.py |
Laptop‑side script: captures screenshots, calls the server, and translates the JSON into adb commands |
| Machine | Requirements |
|---|---|
| Server | • Linux with CUDA‑capable GPU (24 GB VRAM recommended) • Python ≥ 3.10 • torch + transformers• Hugging Face Access Token (because Qwen2.5‑VL is gated) |
| Laptop / Local PC | • Android Studio with a running AVD • adb available in $PATH• Python ≥ 3.10 |
# 1‑A. Create & activate venv / conda env
conda create -n agent python=3.10 -y
conda activate agent
# 1‑B. Install deps
pip install -r requirements.txt
# 1‑C. (one‑time) login to Hugging Face – needed for Qwen‑VL
huggingface-cli login
# paste your HF access‑token (READ scope)
# 1-D. Few-shot Retrieval
# /PAV/server
python shot_composer.py --app_name google_maps --json_file goolge_map_pool.json
python shot_composer.py --app_name ali --json_file aliexpress_pool.json
# 1‑E. Launch the server
uvicorn <server file name>:app --host 0.0.0.0 --port 8000If everything loads correctly you should see:
INFO: Uvicorn running on http://0.0.0.0:8000
Model loaded successfully.# 2‑A. Create venv
conda create -n agent_cli python=3.10 -y
conda activate agent_cli
# 2‑B. Install deps
pip install requests pillow
# 2‑C. Make sure `adb` is in PATH
adb devices # should list your AVD, e.g. emulator‑5554
# ADB path set
export ANDROID_HOME=$HOME/Library/Android/sdk
export PATH=$PATH:$ANDROID_HOME/platform-tools
source ~/.zshrc # or source ~/.bashrc- Turn on your Android studio and activate an emulator (e.g., Pixel9 API36)
- Set location to the designated place (Seoul Ai Hub)
- Login Google Account (ID:[email protected] / PW:samsung2025!)
- For Google Maps task, set the starting point should be below image.
- For AliExpress task, set the starting point should be below image.
You can set the starting point easily with Snapshot in Android Studio!!
- Don't forget that the server (
uvicorn <server file name>:app --host 0.0.0.0 --port 8000) is already on run
python client.py \
--server http://<SERVER_IP>:8000/predict \
--device_id emulator-5554 \
--task "Please display the route to Gwanghwamun Square." \
--image_path "qwen_3b_pav_google_screenshots/0" \
--max_steps 10