This repository contains the source code for the paper SA-VLA: Spatially-Aware Reinforcement Learning for Flow-Matching Vision-Language-Action Models.
SA-VLA is developed on top of the RLinf reinforcement learning framework.
SA-VLA introduces a spatially-aware reinforcement learning approach for training flow-matching vision-language-action models. By incorporating spatial reasoning and RL fine-tuning, our method significantly improves generalization and task performance in embodied AI scenarios.
Overview of SA-VLA. Visual and spatial tokens are fused into geometry-aware embeddings, which are optimized via step-level dense rewards and spatially-conditioned exploration (SCAN) for robust RL adaptation.
Please follow the instructions in README.RLinf.md to configure the RLinf framework.
Important: When pulling the RLinf source code inside the container, replace it with this repository instead:
git clone https://github.com/TwSphinx54/SA-VLA.git
cd SA-VLAAfter the container is set up, run the additional setup script:
bash scripts/setup_container.shCopy the custom model code to the OpenPi site-packages:
cp -r srcs/openpi/models_pytorch /opt/venv/openpi/lib/python3.11/site-packages/openpi/models_pytorchDeploy the LIBERO-PLUS dataset from LIBERO-plus to /opt/venv/openpi/ and rename the directory to libero_plus:
mv /opt/venv/openpi/LIBERO-plus /opt/venv/openpi/libero_plusCopy the provided benchmark files to the LIBERO library:
cp -r srcs/libero_plus/benchmark /opt/venv/openpi/libero/libero/libero/benchmarkPlace all model weights in the weights/ directory with the following structure:
weights
|-- Pi05-LIBERO
| |-- Sylvest
| | `-- ...
| |-- config.json
| |-- model.safetensors
| `-- physical-intelligence
| `-- ...
|-- Pi05-VGGT-LIBERO-FUSER-SFT_BF16
| |-- Sylvest
| | `-- ...
| |-- metadata.pt
| |-- model.safetensors
| |-- optimizer.pt
| `-- physical-intelligence
| `-- ...
`-- RLinf-Pi05-SFT
|-- README.md
|-- configuration.json
|-- model.safetensors
`-- physical-intelligence
`-- ...
First, we perform FUSER adaptation training on LIBERO to obtain the base model Pi05-VGGT-LIBERO-FUSER-SFT_BF16 for subsequent RL training. This step can be completed using the official OpenPi framework.
Requirements:
- Place
srcs/openpi/models_pytorchin the OpenPi site-packages (as done in Deployment Step 1). - Download the official VGGT weights from https://github.com/facebookresearch/vggt.
We provide the necessary configuration and scripts at srcs/openpi:
config.pytrain_fuser.pytrain_fuser.sh
To start training:
bash train_fuser.shAfter obtaining the base model from Step 1, proceed with RL training.
Use scripts/switch_libero.sh to switch between LIBERO and LIBERO-PLUS environments:
# Switch to libero_plus
bash scripts/switch_libero.sh libero_plus
# Switch back to libero
bash scripts/switch_libero.sh liberoImportant: After switching environments, update the is_libero_plus field in examples/embodiment/config/env/libero_spatial.yaml accordingly.
By default, we use a sparse spatial perturbation subset of LIBERO-PLUS for few-shot experiments.
To switch between subsets, full sets, or the complete LIBERO-PLUS dataset, modify the libero_task_map and task_num in:
/opt/venv/openpi/libero/libero/libero/benchmark/__init__.py
We provide scripts for subset selection and sparsification:
scripts/prepare_lp_sparse.pyscripts/prepare_lp_spatial.py
Once the environment and dataset are configured:
bash examples/embodiment/run_embodiment.sh libero_spatial_ppo_openpi_pi05The dataset setup and environment switching for evaluation are the same as for training.
Steps:
- Specify the model weights to use in
examples/embodiment/config/libero_spatial_ppo_openpi_pi05_eval.yaml. - Run evaluation:
# Single evaluation
bash examples/embodiment/eval_embodiment.sh libero_spatial_ppo_openpi_pi05_eval
# Batch evaluation across multiple checkpoints
python scripts/evaluate_across_steps.pyThis project is licensed under the MIT License - see the LICENSE file for details.
This project is built upon the RLinf framework. We thank the authors for their excellent work.
We also acknowledge the following projects that were instrumental to our research:
- LIBERO and LIBERO-PLUS for providing comprehensive robotic manipulation benchmarks.
- OpenPi for the foundational VLA training framework.
- VGGT for the pre-trained visual-geometric features.