Thanks to visit codestin.com
Credit goes to github.com

Skip to content

microsoft/MV-RoboBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes

ZhiYuan Feng¹*, Zhaolu Kang²*, Qijie Wang¹*, Zhiying Du³*, Jiongrui Yan⁴, Shi Shubin⁴, Chengbo Yuan¹, Huizhi Liang¹, Yu Deng⁵, Qixiu Li¹, Rushuai Yang⁶, Ruichuan An², Leqi Zheng¹, Weijie Wang⁷, Shawn Chen⁷, Sicheng Xu⁵, Yaobo Liang⁵, Jiaolong Yang⁵†, Baining Guo⁵


¹Tsinghua University, ²Peking University, ³Fudan University, ⁴Jilin University, ⁵Microsoft Research Asia, ⁶Hong Kong University of Science and Technology, ⁷Zhejiang University

(*Equal Contribution, †Corresponding Author)


🎉 News

  • [2025.10] 📢📢 Paper and initial project release.

📝 To-Do List

  • Release Evaluation Code
  • Release the benchmark dataset on HuggingFace

MV-RoboBench

Data Pipeline

Benchmark Overview: We introduce MV-RoboBench, a benchmark designed to evaluate the multi-view spatial reasoning capabilities of VLMs in robotic scenes. It contains [Number] question-answer pairs across [Number] diverse robotic scenes. The benchmark comprises [Number] challenging tasks, such as [Task 1 Name], [Task 2 Name], and [Task 3 Name]. These tasks are designed to probe various aspects of 3D scene understanding, from establishing object correspondences to understanding relative spatial poses.

Benchmark Examples

📌 A Benchmark for Robotic Scenes: We introduce MV-RoboBench, a comprehensive benchmark designed to evaluate the spatial reasoning of Vision-Language Models in robotic scenes.

📊 Comprehensive Evaluation: We evaluate [Number] state-of-the-art VLMs, including models like GPT-4o and Claude 3, revealing a significant performance gap compared to human-level reasoning.

🔍 Revealing Core Challenges: Our analysis pinpoints key failure modes for current models in robotic scene understanding, particularly in cross-view correspondence, relative pose estimation, and action planning.

Contact

For any questions or suggestions, please feel free to contact Zhiyuan Feng or another author.

About

MV-RoboBench

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •