LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models
Abstract
State-of-the-art Visual-Language-Action models show high benchmark scores but are brittle to various perturbations, particularly in camera viewpoints and robot initial states, and often ignore language instructions.
Visual-Language-Action (VLA) models report impressive success rates on robotic manipulation benchmarks, yet these results may mask fundamental weaknesses in robustness. We perform a systematic vulnerability analysis by introducing controlled perturbations across seven dimensions: objects layout, camera viewpoints, robot initial states, language instructions, light conditions, background textures and sensor noise. We comprehensively analyzed multiple state-of-the-art models and revealed consistent brittleness beneath apparent competence. Our analysis exposes critical weaknesses: models exhibit extreme sensitivity to perturbation factors, including camera viewpoints and robot initial states, with performance dropping from 95% to below 30% under modest perturbations. Surprisingly, models are largely insensitive to language variations, with further experiments revealing that models tend to ignore language instructions completely. Our findings challenge the assumption that high benchmark scores equate to true competency and highlight the need for evaluation practices that assess reliability under realistic variation.
Community
๐ Introducing LIBERO-Plus: A Comprehensive Benchmark for Vision-Language-Action Models
We are excited to unveil LIBERO-Plus, an advanced robustness evaluation tool for Vision-Language-Action (VLA) models. LIBERO-Plus allows researchers to understand how these models perform under various environmental perturbations, shedding light on their vulnerabilities in real-world settings.
๐ Novel Findings: Uncovering Hidden Vulnerabilities
Models exhibit extreme sensitivity to perturbation factors, including camera viewpoints and robot initial states, with performance dropping from 95% to below 30% under modest perturbations.
Models are largely insensitive to language variations, with further experiments revealing that models tend to ignore language instructions completely.
Models exhibit a reliance on superficial visual cues, such as positional bias, rather than a genuine semantic understanding of task-relevant objects.
Compositional Generalization is intrinsically non-decomposable.
Training Data Diversity Significantly Improves Robustness.
...
For more detailed information, please check out our paper.
โ๏ธ Easy to Use: Seamless Transition to LIBERO-Plus
LIBERO-Plus makes it incredibly easy for users to evaluate the robustness of their existing models. With just a few simple steps, you can seamlessly switch from LIBERO to LIBERO-Plus, unlocking powerful tools for automatic and fine-grained evaluation.
๐ Comprehensive, Automatic, and Fine-Grained Benchmark
LIBERO-Plus offers a robust benchmarking framework with 7 perturbation dimensions and 21 sub-dimensions. It provides a fine-grained difficulty scale from L1 to L5, allowing users to systematically assess model performance across various challenges. The construction is automated, including both training and testing datasets, making it easier than ever to conduct comprehensive assessments.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- LIBERO-PRO: Towards Robust and Fair Evaluation of Vision-Language-Action Models Beyond Memorization (2025)
- On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations (2025)
- RoboView-Bias: Benchmarking Visual Bias in Embodied Agents for Robotic Manipulation (2025)
- F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions (2025)
- Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model (2025)
- Contrastive Representation Regularization for Vision-Language-Action Models (2025)
- VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 2
Spaces citing this paper 0
No Space linking this paper