Codestin Search App

CogVLA: Cognition-Aligned Vision-Language-Action Models via Instruction-Driven Routing & Sparsification
NeurIPS 2025

Wei Li, Renshan Zhang, Rui Shao*, Jie He, Liqiang Nie

School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen
*Corresponding author

🔥CogVLA is accepted to NeurIPS 2025!🔥
⭐ Give us a star if you like it! ⭐
✨If you find this work useful for your research, please kindly cite our paper.✨

🔥 Updates

[09/2025] 🔥 Code released. Enjoy it!
[09/2025] 🔥 CogVLA is accepted to NeurIPS 2025!
[08/2025] 🔥 Project page released.
[08/2025] 🔥 arXiv paper released.

Introduction

This is the github repository of CogVLA: Cognition-Aligned Vision-Language-Action Models via Instruction-Driven Routing & Sparsification. CogVLA draws inspiration from human multimodal coordination and introduces a 3-stage progressive architecture.

Extensive experiments on the LIBERO benchmark and real-world robotic tasks demonstrate that CogVLA achieves state-of-the-art performance with success rates of 97.4% and 70.0%, respectively, while reducing training costs by 2.5× and decreasing inference latency by 2.8× compared to OpenVLA.

The overall framework of CogVLA is illustrated below.

Installation

# Create and activate conda environment
conda create -n cogvla python=3.10 -y
conda activate cogvla

# Clone CogVLA repo and pip install to download dependencies
git clone [email protected]:JiuTian-VL/CogVLA.git
cd CogVLA
pip install -e .

# Install Flash Attention 2 for training
pip install packaging ninja
ninja --version; echo $?  # Verify Ninja --> should return exit code "0"
pip install "flash-attn==2.5.5" --no-build-isolation

Training and Evaluation

See LIBERO.md for fine-tuning/evaluating on LIBERO simulation benchmark task suites.

See ALOHA.md for fine-tuning/evaluating on real-world ALOHA robot tasks.

Demos

After training, fill your checkpoint path in demo.py. Then run the following command

CUDA_VISIBLE_DEVICES=0 python demo.py

Experiments

Performance. CogVLA achieves state-of-the-art performance with success rates of 97.4% and 70.0% on simulation and real-world tasks, respectively.

Efficiency. CogVLA also reduces training costs by 2.5× and decreases inference latency by 2.8× compared to OpenVLA.

Visualization

The attention maps of CogVLA highlight task-relevant regions in the input image, well aligning with human cognition during task execution.

GaLaXea.R1.Lite.Robot.Folding.Clothes.Demo.mp4

GaLaXea.R1.Lite.Robot.Open.Drawer.and.Place.Toy.Demo.mp4

🔥 Citation

If you find this work useful for your research, please kindly cite our paper.

@article{li2025cogvla,
  title={CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing \& Sparsification},
  author={Li, Wei and Zhang, Renshan and Shao, Rui and He, Jie and Nie, Liqiang},
  journal={Advances in neural information processing systems},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
assets		assets
docs		docs
experiments/robot		experiments/robot
prismatic		prismatic
scripts-sh		scripts-sh
scripts		scripts
vla-scripts		vla-scripts
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
demo.py		demo.py
pyproject.toml		pyproject.toml
requirements-min.txt		requirements-min.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CogVLA: Cognition-Aligned Vision-Language-Action Models via Instruction-Driven Routing & Sparsification
NeurIPS 2025

🔥CogVLA is accepted to NeurIPS 2025!🔥
⭐ Give us a star if you like it! ⭐
✨If you find this work useful for your research, please kindly cite our paper.✨

🔥 Updates

Introduction

Installation

Training and Evaluation

Demos

Experiments

Visualization

🔥 Citation

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

JiuTian-VL/CogVLA

Folders and files

Latest commit

History

Repository files navigation

CogVLA: Cognition-Aligned Vision-Language-Action Models via Instruction-Driven Routing & SparsificationNeurIPS 2025

🔥CogVLA is accepted to NeurIPS 2025!🔥 ⭐ Give us a star if you like it! ⭐ ✨If you find this work useful for your research, please kindly cite our paper.✨

🔥 Updates

Introduction

Installation

Training and Evaluation

Demos

Experiments

Visualization

🔥 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

CogVLA: Cognition-Aligned Vision-Language-Action Models via Instruction-Driven Routing & Sparsification
NeurIPS 2025

🔥CogVLA is accepted to NeurIPS 2025!🔥
⭐ Give us a star if you like it! ⭐
✨If you find this work useful for your research, please kindly cite our paper.✨

Packages