Thanks to visit codestin.com
Credit goes to github.com

Skip to content

OS-Copilot/OS-Genesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

43 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

OS-Genesis

overview

arXiv License Paper page Generic badge Discord 🌐 Website

This repository contains the code and data for the ACL 2025 paper OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis.

Overview

We introduce OS-Genesis, an interaction-driven pipeline for synthesizing high-quality and diverse GUI agent trajectory data without human supervision or predefined tasks. By leveraging reverse task synthesis and a trajectory reward model, OS-Genesis enables effective end2end training of GUI agents.

Build Trajectories

We provide scripts and instructions to help you build trajectories in colletction.

overview

Training

For details and operations of the training, please refer to the InternVL2 documentation and Qwen2-VL.

Evaluation

AndroidControl

To evaluate the AndroidControl Benchmark, please follow the steps below:

  1. Clone the GitHub Repository:

    git clone https://github.com/OS-Copilot/OS-Genesis.git
    
  2. Inference:

    cd OS-Genesis/evaluation/android_control
    bash run_ac_inference.sh $dataset $checkpoint
    
  3. Evaluation:

    pyhton ac_eval.py
    

Mobile

AndroidControl

Model Name Base Model Training Data HF Link
OS-Genesis-4B-AC InternVL2-4B OS-Genesis-ac-training-data πŸ€— link
OS-Genesis-7B-AC Qwen2-VL-7B-Instruct OS-Genesis-ac-training-data πŸ€— link
OS-Genesis-8B-AC InternVL2-8B OS-Genesis-ac-training-data πŸ€— link

AndroidWorld

Model Name Base Model Training Data HF Link
OS-Genesis-4B-AW InternVL2-4B OS-Genesis-aw-training-data πŸ€— link
OS-Genesis-7B-AW Qwen2-VL-7B-Instruct OS-Genesis-aw-training-data πŸ€— link
OS-Genesis-8B-AW InternVL2-8B OS-Genesis-aw-training-data πŸ€— link

Web

Model Name Base Model Training Data HF Link
OS-Genesis-4B-WA InternVL2-4B OS-Genesis-web-training-data πŸ€— link
OS-Genesis-7B-WA Qwen2-VL-7B-Instruct OS-Genesis-web-training-data πŸ€— link
OS-Genesis-8B-WA InternVL2-8B OS-Genesis-web-training-data πŸ€— link

More Resources πŸ“š

Raw collected triples

In addition to our complete trajectory data on HuggingFace, we also provide collected raw <s_pre, a, s_post> triples. You can use them to reproduce the process of reverse task synthesis directly, without re-collecting them from emulators yourself πŸ˜„. The screenshots and corresponding texts (with SoM info contained) are provided below:

Data Type Screenshots Data JSON
Mobile Screenshots Data JSON
Web Screenshots Data JSON

Feel free to email me if you require additional data of this kind.

FAQ ❓

We have collected some questions from emails, Hugging Face, and WeChat communications. Please check the FAQ πŸ€–

Explore More Works πŸ”

  1. OS-Atlas πŸ€–
  2. ScienceBoard πŸ§ͺ
  3. GUIMid πŸ“Š

Citation πŸ“–

🫢 If you are interested in our work or find this repository / our data helpful, please consider using the following citation format when referencing our paper:

@article{sun2024genesis,
  title={OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis},
  author={Sun, Qiushi and Cheng, Kanzhi and Ding, Zichen and Jin, Chuanyang and Wang, Yian and Xu, Fangzhi and Wu, Zhenyu and Jia, Chengyou and Chen, Liheng and Liu, Zhoumianze and others},
  journal={arXiv preprint arXiv:2412.19723},
  year={2024}
}

About

[ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5