👋 Hi, everyone! We are proud to present the first fully open-source GUI Agent with both model and infrastructure. Our solution features plug-and-play engineering with no cloud dependencies, giving you complete privacy control.
- 🎁 [2025-12-18] We release Step-GUI Technical Report on arXiv!
- 🎁 [2025-12-18] We release a more powerful API for GUI automation tasks. Apply for API access here!
- 🎁 [2025-12-12] We release MCP-Server support for multi-device management and task distribution. See Installation & Quick Start and MCP-Server Setup for setup instructions.
- 🎁 [2025-12-1] We thank the following projects and authors for providing quantization tools & tutorials: GGUF_v1, GGUF_v2, EXL3, Tutorials_CN, Tutorials_EN
- 🎁 [2025-11-31] We release a lightweight 4B model GELab-Zero-4B-preview on Hugging Face and Model Scope.
- 🎁 [2025-11-31] We release the tasks from the AndroidDaily benchmark.
- 🎁 [2025-11-30] We release the current GELab-Zero engineering infrastructure.
- 🎁 [2025-10] Our research paper on GELab-Engine is accepted by NeurIPS 2025.
You can contact us and communicate with us by joining our WeChat group:
| WeChat Group |
|---|
As AI experiences increasingly penetrate consumer-grade devices, Mobile Agent research is at a critical juncture: transitioning from "feasibility verification" to "large-scale application." While GUI-based solutions offer universal compatibility, the fragmentation of mobile ecosystems imposes heavy engineering burdens that hinder innovation. GELab-Zero is designed to dismantle these barriers.
-
⚡️ Out-of-the-Box Full-Stack Infrastructure Resolves the fragmentation of the mobile ecosystem with a unified, one-click inference pipeline. It automatically handles multi-device ADB connections, dependencies, and permissions, allowing developers to focus on strategic innovation rather than engineering infrastructure.
-
🖥️ Consumer-Grade Local Deployment Features a built-in 4B GUI Agent model fully optimized for Mac (M-series) and NVIDIA RTX 4060. It supports complete local execution, ensuring data privacy and low latency on standard consumer hardware.
-
📱 Flexible Task Distribution & Orchestration Supports distributing tasks across multiple devices with interaction trajectory recording. It offers three versatile modes—ReAct loops, multi-agent collaboration, and scheduled tasks—to handle complex, real-world business scenarios.
-
🚀 Accelerate from Prototype to Production Empowers developers to rapidly validate interaction strategies while allowing enterprises to directly reuse the underlying infrastructure for zero-cost MCP integration, bridging the critical gap between "feasibility verification" and "large-scale application."
Task: Help me find any good recent sci-fi movies
Task: Help me find a place where I can take my kids on the weekend
Task: Claim meal vouchers on the enterprise welfare platform
Task: Check if Metro Line 1 is operating normally, then navigate to the nearest entrance of Line 1 metro station
Task: Go to the nearest Hema Fresh Store on Ele.me and purchase: Red strawberries 300g, Peruvian Bianca blueberries 125g (18mm diameter), seasonal fresh yellow potatoes 500g, sweet baby pumpkin 750g, Hema large grain shrimp sliders, 2 bottles of Hema pure black soy milk 300ml, Little Prince macadamia nut cocoa crisp 120g, Hema spinach noodles, Hema five-spice beef, 5 bags of Haohuan snail Liuzhou river snail rice noodles (extra spicy extra smelly) 400g, m&m's milk chocolate beans 100g
Task: Search for 'how to learn financial management' on Zhihu and view the first answer with over 10k likes
Task: Find a pair of white canvas shoes in size 37 on Taobao, priced under 100 yuan, then add the first item that meets the criteria to favorites
Task: Go to Baicizhan and help me complete the vocabulary learning task
We conducted comprehensive evaluations of GELab-Zero-4B-preview model across multiple open-source benchmarks, covering various dimensions including GUI understanding, localization, and interaction. The comparison results with other open-source models are shown below:
The benchmark results demonstrate that GELab-Zero-4B-preview exhibits exceptional performance across multiple open-source benchmarks, with particularly outstanding results in real mobile scenarios (Android World), proving its strong capabilities in practical applications.
End-to-end inference requires just a few simple steps:
- Set up LLM inference environment (ollama or vllm)
- Set up Android device execution environment (adb configuration) and enable developer mode
- Set up Agent runtime environment (gelab-zero one-click deployment script)
- Set up trajectory visualization environment (optional) The third-party infrastructure dependencies mentioned above are very mature, so don't be afraid.
We assume you have installed Python 3.12+ environment and have a certain command line operation foundation. If you have not installed the python environment yet, please refer to Step 0 for installation.
If you have not installed Python 3.12+ environment yet, you can refer to the following steps for installation: For commercial friendliness and cross-platform support, we recommend using miniforge for Python environment installation and management. Official website: https://github.com/conda-forge/miniforge
- Windows Users: MUST USE powershell
-
Directly download and manually install Miniforge. Refer to the Install section at: https://github.com/conda-forge/miniforge. During installation, ensure to check the option to add Conda to the PATH environment variable to guarantee proper activation of Conda.
-
After installation, activate Conda. Open PowerShell and enter the following commands:
# Activate Conda in PowerShell
conda init powershell
# Allow Conda scripts to run on PowerShell startup
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUserSuccessful activation is indicated by "(base)" displayed at the beginning of the latest line in the terminal.
- It is recommended to use VS Code for code execution and debugging. Download and install it from the official website: https://code.visualstudio.com/
- MAC and Linux Users:
- Download and install miniforge using the command line:
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).shAfter installation, create and activate a new Python environment:
conda create -n gelab-zero python=3.12 -y
conda activate gelab-zeroWe have verified two mainstream LLM local inference deployment methods: ollama and vllm. Personal users are recommended to use the ollama method, while enterprise users and those with certain technical backgrounds can choose the vllm method for more stable inference services.
For individual users conducting local inference, we strongly recommend using Ollama for local deployment, as it offers the advantages of simple installation and easy usage.
-
Windows and Mac users: You can directly download and install the graphical version from the official website: https://ollama.com/.
-
Linux users: Refer to the official documentation for installation: https://ollama.com/download/linux. The one-click installation command for Linux users is as follows:
# Download and install the latest Linux version of Ollama AppImage
curl -fsSL https://ollama.com/install.sh | shAfter completing the installation of Ollama, you need to download and deploy the gelab-zero-4b-preview model using the following commands:
# If huggingface cli is not installed yet, execute this command first
pip install huggingface_hub
# If the download speed is slow in China, you can try using the mirror acceleration "https://hf-mirror.com"
# WINDOWS users can use the following command:
# $env:HF_ENDPOINT = "https://hf-mirror.com"
# LINUX and MAC users can use the following command:
# export HF_ENDPOINT="https://hf-mirror.com"
# Download the gelab-zero-4b-preview model weights from huggingface
hf download --no-force-download stepfun-ai/GELab-Zero-4B-preview --local-dir gelab-zero-4b-preview
# Import the model into ollama
cd gelab-zero-4b-preview
ollama create gelab-zero-4b-preview -f Modelfile
# If Windows users encounter an error, they need to specify the installation path, for example:
# C:\Users\admin\AppData\Local\Programs\Ollama\ollama.exe create gelab-zero-4b-preview -f Modelfile
# If your computer has low configuration, you may consider quantizing the model to improve inference speed. Note that quantization may cause a certain loss of model performance.
# For detailed documentation, see: https://docs.ollama.com/import#quantizing-a-model
# Quantize the model with int8 precision (small precision loss, model size becomes 4.4G):
ollama create -q q8_0 gelab-zero-4b-preview
# Quantize the model with int4 precision (large precision loss, model size becomes 2.2G):
ollama create -q Q4_K_M gelab-zero-4b-preview
# Revert to the original precision:
ollama create -q f16 gelab-zero-4b-preview-
Windows users: You can open the Ollama app, select the model gelab-zero-4b-preview, and send a message to test whether the model can reply correctly.
-
Mac and Linux users: You can test whether the model is installed successfully using the following command:
curl -X POST http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gelab-zero-4b-preview",
"messages": [{"role": "user", "content": "Hello, GELab-Zero!"}]
}'The expected output should include the model's reply content, indicating that the model has been successfully installed and is running. For example:
{"id":"chatcmpl-174","object":"chat.completion","created":1764405566,"model":"gelab-zero-4b-preview","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! I'm here to help with any questions or information you might need. How can I assist you today?"},"finish_reason":"stop"}],"usage":{"prompt_tokens":16,"completion_tokens":24,"total_tokens":40}}After completing the above steps, it indicates that your ollama environment and gelab-zero-4b-preview model have been successfully installed, and you can proceed to the next step of configuring the mobile execution environment.
To enable GELab-Zero to control the phone for task execution, you need to complete the following steps to configure the mobile execution environment:
- Enable developer mode and USB debugging on the phone.
- Install the ADB tool and ensure that the computer can connect to the phone via ADB. (If you have already installed the adb tool, you can skip this step)
- Connect the phone to the computer via a USB cable and use the adb devices command to confirm a successful connection.
Generally, you can enable developer mode and USB debugging on Android phones by following these steps:
- Go to the "Settings" app on your phone.
- Find the "About Phone" or "System" option, and tap on the "Build Number" 10+ times until you see a message saying "You are now a developer."
- Go back to the main "Settings" menu and find "Developer Options."【Important, must enable】
- In "Developer Options," find and enable the "USB Debugging" feature. Follow the on-screen instructions to enable USB debugging.【Important, must enable】
Different phone brands may have slight variations, so please adjust according to your specific situation. Generally, searching for " how to enable developer mode" will yield relevant tutorials. After completing the setup, it should look like the image below:
ADB (Android Debug Bridge) is a bridge tool for communication between Android devices and computers. You can install the ADB tool by following these steps:
- Windows Users:
- Download the ADB tool package: https://dl.google.com/android/repository/platform-tools-latest-windows.zip and extract it to a suitable location.
- Add the extracted folder path to the system environment variables so that you can use the adb command directly in the command line. For detailed steps, see: https://learn.microsoft.com/en-us/previous-versions/office/developer/sharepoint-2010/ee537574(v=office.14) .The specific steps include:
1. Right-click "Computer" in the "Start" menu and select "Properties."
2. Click "Advanced system settings."
3. In the "System Properties" dialog box, click the "Environment Variables" button.
4. In the "System variables" section, find and select the "Path" variable, then click the "Edit" button.
5. In the "Edit Environment Variables" dialog box, click "New," and then enter the extracted path of the ADB tool package.
6. Click "OK" to save the changes and close all dialog boxes.
- MAC and Linux Users:
- You can install the ADB tool using Homebrew (Mac) or package managers (Linux). If you don't have Homebrew installed, you should install it first with the command:
ruby -e $(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)- Then use the following command to install the ADB tool:
brew cask install android-platform-toolsAfter connecting your phone to the computer using a USB cable, open a terminal or command prompt and
adb devicesIf the connection is successful, you will see an output similar to the following, showing the list of connected devices:
List of devices attached
AN2CVB4C28000731 deviceIf you do not see any devices, please check if the USB cable and the USB debugging settings on your phone are correctly enabled. When connecting the phone for the first time, an authorization prompt may pop up on the phone; simply select "Allow." As shown in the image below:
If the installation is unsuccessful, you can refer to third-party documentation: quickappcn/issues#120 for further troubleshooting.
After completing the above steps, you can deploy the GELab-Zero runtime environment with the following command:
# Clone the repository
git clone https://github.com/stepfun-ai/gelab-zero
cd gelab-zero
# Install dependencies
pip install -r requirements.txt
# To inference a single task
python examples/run_single_task.pyThe trajectory will be defult saved in the running_log/server_log/os-copilot-local-eval-logs/ directory. You can visualize the trajectory using streamlit:
# If you want other devices in the local area network (LAN) to access it, use --server.address 0.0.0.0
streamlit run --server.address 0.0.0.0 visualization/main_page.py --server.port 33503
# If you only want to access it on the local machine, use the following command:
streamlit run --server.address 127.0.0.1 visualization/main_page.py --server.port 33503Then open your browser and go to http://localhost:33503 to access the visualization interface.
Each task execution will generate a unique session ID, which can be used to query and visualize the corresponding trajectory in the visualization interface.
The action with point(s) such as click and slide will be marked on the screenshot for better understanding of the agent's behavior.
Make sure you have already downloaded the GELab-Zero-4B-preview model locally.
Clone the official llama.cpp repository:
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
pip install -r requirements.txt
# If there are dependency conflicts, create a Conda virtual environment.Convert the model to GGUF format. Command-line arguments:
- The first path points to your locally downloaded GELab-Zero-4B-preview from Hugging Face.
--outtypespecifies the quantization precision.--outfileis the output filename; you can customize the path.
# No quantization, keep full model quality
python convert_hf_to_gguf.py /PATH/TO/gelab-zero-4b-preview --outtype f16 --verbose --outfile gelab-zero-4b-preview_f16.gguf
# Quantized (faster but lossy; known issue: <THINK> may become <THIN>)
python convert_hf_to_gguf.py /PATH/TO/gelab-zero-4b-preview --outtype q8_0 --verbose --outfile gelab-zero-4b-preview_q8_0.ggufThe INT8-quantized GGUF file is ~4.28 GB for reference.
GELab-Zero-4B-preview is a vision model, so you also need to export an mmproj file:
# INT8 quantization for mmproj
python convert_hf_to_gguf.py /PATH/TO/gelab-zero-4b-preview --outtype q8_0 --verbose --outfile gelab-zero-4b-preview_q8_0_mmproj.gguf --mmprojThe INT8-quantized mmproj GGUF file is ~454 MB for reference.
You can use any llama.cpp-compatible client to spin up a local API service; here we use Jan as an example:
Download the Jan client and install it.
Go to Settings → Model Provider → choose llama.cpp, then import the models:
Select the two GGUF files you just converted:
Back in the model UI, click Start.
Create a chat to verify the model runs correctly:
Once tokens are streaming normally, start the local API server.
Go to Settings → Local API Server, create an API key under server configuration, then launch the service:
llama.cpp’s service differs slightly from Ollama, so you must tweak the model config in GELab-Zero Agent. Two places:
- In
model_config.yaml, update the port and API key (use the key you just created):
local:
api_base: "http://localhost:1337/v1"
api_key: "YOUR_KEY"- In
examples/run_single_task.py, remove any parameter suffix from the model name (line 21):
local_model_config = {
"task_type": "parser_0922_summary",
"model_config": {
"model_name": "gelab-zero",
"model_provider": "local",
"args": {
"temperature": 0.1,
"top_p": 0.95,
"frequency_penalty": 0.0,
"max_tokens": 4096,
},# enable mcp server
python mcp_server/detailed_gelab_mcp_server.pyIf you find GELab-Zero useful for your research, please consider citing our work :)
@misc{yan2025stepguitechnicalreport,
title={Step-GUI Technical Report},
author={Haolong Yan and Jia Wang and Xin Huang and Yeqing Shen and Ziyang Meng and Zhimin Fan and Kaijun Tan and Jin Gao and Lieyu Shi and Mi Yang and Shiliang Yang and Zhirui Wang and Brian Li and Kang An and Chenyang Li and Lei Lei and Mengmeng Duan and Danxun Liang and Guodong Liu and Hang Cheng and Hao Wu and Jie Dong and Junhao Huang and Mei Chen and Renjie Yu and Shunshan Li and Xu Zhou and Yiting Dai and Yineng Deng and Yingdan Liang and Zelin Chen and Wen Sun and Chengxu Yan and Chunqin Xu and Dong Li and Fengqiong Xiao and Guanghao Fan and Guopeng Li and Guozhen Peng and Hongbing Li and Hang Li and Hongming Chen and Jingjing Xie and Jianyong Li and Jingyang Zhang and Jiaju Ren and Jiayu Yuan and Jianpeng Yin and Kai Cao and Liang Zhao and Liguo Tan and Liying Shi and Mengqiang Ren and Min Xu and Manjiao Liu and Mao Luo and Mingxin Wan and Na Wang and Nan Wu and Ning Wang and Peiyao Ma and Qingzhou Zhang and Qiao Wang and Qinlin Zeng and Qiong Gao and Qiongyao Li and Shangwu Zhong and Shuli Gao and Shaofan Liu and Shisi Gao and Shuang Luo and Xingbin Liu and Xiaojia Liu and Xiaojie Hou and Xin Liu and Xuanti Feng and Xuedan Cai and Xuan Wen and Xianwei Zhu and Xin Liang and Xin Liu and Xin Zhou and Yingxiu Zhao and Yukang Shi and Yunfang Xu and Yuqing Zeng and Yixun Zhang and Zejia Weng and Zhonghao Yan and Zhiguo Huang and Zhuoyu Wang and Zheng Ge and Jing Li and Yibo Zhu and Binxing Jiao and Xiangyu Zhang and Daxin Jiang},
year={2025},
eprint={2512.15431},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.15431},
}
@software{gelab_zero_2025,
title={GELab-Zero: An Advanced Mobile Agent Inference System},
author={GELab Team},
year={2025},
url={https://github.com/stepfun-ai/gelab-zero}
}
@misc{gelab_engine,
title={GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning},
author={Haolong Yan and Yeqing Shen and Xin Huang and Jia Wang and Kaijun Tan and Zhixuan Liang and Hongxin Li and Zheng Ge and Osamu Yoshie and Si Li and Xiangyu Zhang and Daxin Jiang},
year={2025},
eprint={2512.02423},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.02423},
}