PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks

Junxian Li, Kai Liu, Leyang Chen, Weida Wang, Zhixin Wang, Jiaqi Xu, Fan Li, Renjing Pei, Linghe Kong, and Yulun Zhang

"PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks", arXiv 2026

[project] [supplementary material] [dataset]

🔥🔥🔥 News

2026-02-03: This repo is released.

Abstract: Unified multimodal models (UMMs) have shown impressive capabilities in generating natural images and supporting multimodal reasoning. However, their potential in supporting computer-use planning tasks, which are closely related to our lives, remains underexplored. Image generation and editing in computer-use tasks require capabilities like spatial reasoning and procedural understanding, and it is still unknown whether UMMs have these capabilities to finish these tasks or not. Therefore, we propose PlanViz, a new benchmark designed to evaluate image generation and editing for computer-use tasks. To achieve the goal of our evaluation, we focus on sub-tasks which frequently involved in daily life and require planning steps. Specifically, three new sub-tasks are designed: route planning, work diagramming, and web&UI displaying. We address challenges in data quality ensuring by curating human-annotated questions and reference images, and a quality control process. For challenges of comprehensive and exact evaluation, a task-adaptive score, PlanScore, is proposed. The score helps understanding the correctness, visual quality and efficiency of generated images. Through experiments, we highlight key limitations and opportunities for future research on this topic.

Pipeline

🔖 TODO

Release test data.
Provide HuggingFace demo.

🔗 Contents

📦 Datasets

We show the data distribution and hot topics of our benchmark.

📄 Testing

TBD

🔎 Results

We present the performance of various models on PlanViz.

Quantitative Results (click to expand)

Results in Tab. 3 of the main paper

Results in Tab. 4 of the main paper

Qualitative Results (click to expand)

Results in Fig. 6 of the main paper

📎 Citation

If you find our dataset and code helpful in your research or work, please cite the following paper.

@article{li2026planviz,
  title={PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks},
  author={Li, Junxian and Liu, Kai and Chen, Leyang and Wang, Weida and Wang, Zhixin and Xu, Jiaqi and Li, Fan and Pei, Renjing and Kong, Linghe and Zhang, Yulun},
  journal={arXiv preprint arXiv:2602.06663},
  year={2026}
}

💡 Acknowledgements

This project is built based on numerous model repositories. Thanks UmniBench for research ideas.

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
figs		figs
README.md		README.md
qwen_eval_larger_api.py		qwen_eval_larger_api.py
score.py		score.py
template.py		template.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks

🔥🔥🔥 News

Pipeline

🔖 TODO

🔗 Contents

📦 Datasets

📄 Testing

🔎 Results

📎 Citation

💡 Acknowledgements

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks

🔥🔥🔥 News

Pipeline

🔖 TODO

🔗 Contents

📦 Datasets

📄 Testing

🔎 Results

📎 Citation

💡 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages