VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation

Siyu Xu, Yunke Wang, Chenghao Xia, Dihao Zhu, Tao Huang, Chang Xu

🔥 VLA-Cache is a training-free, plug-and-play solution for accelerating vision-language-action models.

📌 News

🔥 [2025/09/18] Our VLA-Cache is accepted by NeurIPS 2025!

🔥 [2025/06/12]: Code for OpenVLA is available (OpenVLA README).

🔥 [2025/05/29]: Code for OpenVLA-OFT is released (OpenVLA-OFT README).

🎯 Overview

Vision-Language-Action (VLA) models can map multi-modal inputs (vision + language) to actions for robotic tasks in an end-to-end manner. However, due to the high frame rate and spatial complexity in robotic control, VLA inference can be computationally expensive.

VLA-Cache introduces a lightweight and effective caching mechanism by detecting unchanged visual tokens between frames and reusing their key-value computations. This leads to substantial speed-up with minimal accuracy loss.

🛠️ Installation

1. Clone the repository

git clone https://github.com/siyuhsu/vla-cache.git
cd vla-cache

2. Set up environments

Follow the OpenVLA and OpenVLA-OFT setup instructions.

For OpenVLA:

conda activate openvla
cd src/openvla
pip install -e .

For OpenVLA-OFT:

conda activate openvla-oft
cd src/openvla-oft
pip install -e .

🚀 VLA-Cache Evaluation

🔧 OpenVLA Evaluation

✅ Download pretrained checkpoint:

conda activate openvla
cd src/openvla
python vla_cache_scripts/download_model_local.py \
  --model_id openvla/openvla-7b-finetuned-libero-spatial

▶️ Run evaluation with VLA-Cache:

python experiments/robot/libero/run_libero_eval.py \
  --pretrained_checkpoint checkpoints/openvla-7b-finetuned-libero-spatial \
  --task_suite_name libero_spatial \
  --use_vla_cache True

❌ Run baseline without VLA-Cache:

python experiments/robot/libero/run_libero_eval.py \
  --pretrained_checkpoint checkpoints/openvla-7b-finetuned-libero-spatial \
  --task_suite_name libero_spatial \
  --use_vla_cache False

🔧 OpenVLA-OFT Evaluation

✅ Download pretrained checkpoint:

conda activate openvla-oft
cd src/openvla-oft
python vla_cache_scripts/download_model_local.py \
  --model_id moojink/openvla-7b-oft-finetuned-libero-spatial

▶️ Run evaluation with VLA-Cache:

python experiments/robot/libero/run_libero_eval.py \
  --pretrained_checkpoint checkpoints/openvla-7b-oft-finetuned-libero-spatial \
  --task_suite_name libero_spatial \
  --use_vla_cache True

❌ Run baseline without VLA-Cache:

python experiments/robot/libero/run_libero_eval.py \
  --pretrained_checkpoint checkpoints/openvla-7b-oft-finetuned-libero-spatial \
  --task_suite_name libero_spatial \
  --use_vla_cache False

📖 Citation

If you find this work useful, please cite:

@article{xu2025vla,
  title={VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation},
  author={Xu, Siyu and Wang, Yunke and Xia, Chenghao and Zhu, Dihao and Huang, Tao and Xu, Chang},
  journal={arXiv preprint arXiv:2502.02175},
  year={2025}
}

🤝 Acknowledgements

We build on the amazing work of OpenVLA, OpenVLA-OFT, and Huggingface Transformers.

📜 License

This project is licensed under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
assests		assests
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation

📌 News

🎯 Overview

🛠️ Installation

1. Clone the repository

2. Set up environments

For OpenVLA:

For OpenVLA-OFT:

🚀 VLA-Cache Evaluation

🔧 OpenVLA Evaluation

✅ Download pretrained checkpoint:

▶️ Run evaluation with VLA-Cache:

❌ Run baseline without VLA-Cache:

🔧 OpenVLA-OFT Evaluation

✅ Download pretrained checkpoint:

▶️ Run evaluation with VLA-Cache:

❌ Run baseline without VLA-Cache:

📖 Citation

🤝 Acknowledgements

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

siyuhsu/vla-cache

Folders and files

Latest commit

History

Repository files navigation

VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation

📌 News

🎯 Overview

🛠️ Installation

1. Clone the repository

2. Set up environments

For OpenVLA:

For OpenVLA-OFT:

🚀 VLA-Cache Evaluation

🔧 OpenVLA Evaluation

✅ Download pretrained checkpoint:

▶️ Run evaluation with VLA-Cache:

❌ Run baseline without VLA-Cache:

🔧 OpenVLA-OFT Evaluation

✅ Download pretrained checkpoint:

▶️ Run evaluation with VLA-Cache:

❌ Run baseline without VLA-Cache:

📖 Citation

🤝 Acknowledgements

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages