arxiv:2510.13626

LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models

Published on Oct 15

· Submitted by

Siyin Wang on Oct 16

OpenMOSS (SII, Fudan NLP)

Upvote

Authors:

Siyin Wang ,

Abstract

State-of-the-art Visual-Language-Action models show high benchmark scores but are brittle to various perturbations, particularly in camera viewpoints and robot initial states, and often ignore language instructions.

AI-generated summary

Visual-Language-Action (VLA) models report impressive success rates on robotic manipulation benchmarks, yet these results may mask fundamental weaknesses in robustness. We perform a systematic vulnerability analysis by introducing controlled perturbations across seven dimensions: objects layout, camera viewpoints, robot initial states, language instructions, light conditions, background textures and sensor noise. We comprehensively analyzed multiple state-of-the-art models and revealed consistent brittleness beneath apparent competence. Our analysis exposes critical weaknesses: models exhibit extreme sensitivity to perturbation factors, including camera viewpoints and robot initial states, with performance dropping from 95% to below 30% under modest perturbations. Surprisingly, models are largely insensitive to language variations, with further experiments revealing that models tend to ignore language instructions completely. Our findings challenge the assumption that high benchmark scores equate to true competency and highlight the need for evaluation practices that assess reliability under realistic variation.

View arXiv page View PDF Project page GitHub 45 Add to collection

Community

sinwang

Paper author Paper submitter 3 days ago

🚀 Introducing LIBERO-Plus: A Comprehensive Benchmark for Vision-Language-Action Models

We are excited to unveil LIBERO-Plus, an advanced robustness evaluation tool for Vision-Language-Action (VLA) models. LIBERO-Plus allows researchers to understand how these models perform under various environmental perturbations, shedding light on their vulnerabilities in real-world settings.

sinwang

Paper author Paper submitter 3 days ago

🔍 Novel Findings: Uncovering Hidden Vulnerabilities

Models exhibit extreme sensitivity to perturbation factors, including camera viewpoints and robot initial states, with performance dropping from 95% to below 30% under modest perturbations.
Models are largely insensitive to language variations, with further experiments revealing that models tend to ignore language instructions completely.
Models exhibit a reliance on superficial visual cues, such as positional bias, rather than a genuine semantic understanding of task-relevant objects.
Compositional Generalization is intrinsically non-decomposable.
Training Data Diversity Significantly Improves Robustness.

...

For more detailed information, please check out our paper.

sinwang

Paper author Paper submitter 3 days ago

⚙️ Easy to Use: Seamless Transition to LIBERO-Plus

LIBERO-Plus makes it incredibly easy for users to evaluate the robustness of their existing models. With just a few simple steps, you can seamlessly switch from LIBERO to LIBERO-Plus, unlocking powerful tools for automatic and fine-grained evaluation.

sinwang

Paper author Paper submitter 3 days ago

📊 Comprehensive, Automatic, and Fine-Grained Benchmark

LIBERO-Plus offers a robust benchmarking framework with 7 perturbation dimensions and 21 sub-dimensions. It provides a fine-grained difficulty scale from L1 to L5, allowing users to systematically assess model performance across various challenges. The construction is automated, including both training and testing datasets, making it easier than ever to conduct comprehensive assessments.