UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

📢 News

[15 May 2025] UI-Vision grounding dataset (Element and Layout Grounding) and evaluation code released
[1 May 2025] UI-Vision got accepted to ICML 2025 🔥
[19 March 2025] Project website is live at uivision.github.io
[19 March 2025] UI-Vision paper is available on arXiv 🔥 🔥

Introduction

UI-Vision is a comprehensive, license-permissive benchmark for offline, fine-grained evaluation of computer use agents in real-world desktop environments across 83 software applications spanning 6 categories. The benchmark includes three tasks:

Element Grounding
Layout Grounding
Action Prediction

The benchmark aims to advance the development of more capable agents for real-world desktop tasks.

Repository Structure

├── eval/
│   └── grounding/   # Scripts for element and layout grounding evaluation
│   └── action_prediction/   # Scripts for action prediction evaluation
├── .gitignore          # Git ignore file
└── README.md           # Project documentation

Citation

If you find UI-Vision useful in your research, please consider citing our paper:

@misc{nayak2025uivisiondesktopcentricguibenchmark,
  title={UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction},
  author={Shravan Nayak and Xiangru Jian and Kevin Qinghong Lin and Juan A. Rodriguez and
  Montek Kalsi and Rabiul Awal and Nicolas Chapados and M. Tamer Özsu and
  Aishwarya Agrawal and David Vazquez and Christopher Pal and Perouz Taslakian and
  Spandana Gella and Sai Rajeswar},
  year={2025},
  eprint={2503.15661},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2503.15661},
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
eval		eval
.gitignore		.gitignore
README.md		README.md
mapdict_cat2platform.json		mapdict_cat2platform.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

📢 News

Introduction

Repository Structure

Citation

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

lindsey98/UI-Vision

Folders and files

Latest commit

History

Repository files navigation

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

📢 News

Introduction

Repository Structure

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages