SA-VLA: Spatially-Aware Reinforcement Learning for Flow-Matching Vision-Language-Action Models

This repository contains the source code for the paper SA-VLA: Spatially-Aware Reinforcement Learning for Flow-Matching Vision-Language-Action Models.

SA-VLA is developed on top of the RLinf reinforcement learning framework.

SA-VLA introduces a spatially-aware reinforcement learning approach for training flow-matching vision-language-action models. By incorporating spatial reasoning and RL fine-tuning, our method significantly improves generalization and task performance in embodied AI scenarios.

Method Overview

Overview of SA-VLA. Visual and spatial tokens are fused into geometry-aware embeddings, which are optimized via step-level dense rewards and spatially-conditioned exploration (SCAN) for robust RL adaptation.

Prerequisites

RLinf Framework Setup

Please follow the instructions in README.RLinf.md to configure the RLinf framework.

Important: When pulling the RLinf source code inside the container, replace it with this repository instead:

git clone https://github.com/TwSphinx54/SA-VLA.git
cd SA-VLA

Additional Container Setup

After the container is set up, run the additional setup script:

bash scripts/setup_container.sh

Deployment

1. Model Code Deployment

Copy the custom model code to the OpenPi site-packages:

cp -r srcs/openpi/models_pytorch /opt/venv/openpi/lib/python3.11/site-packages/openpi/models_pytorch

2. LIBERO-PLUS Dataset Deployment

Deploy the LIBERO-PLUS dataset from LIBERO-plus to /opt/venv/openpi/ and rename the directory to libero_plus:

mv /opt/venv/openpi/LIBERO-plus /opt/venv/openpi/libero_plus

3. Benchmark Files Deployment

Copy the provided benchmark files to the LIBERO library:

cp -r srcs/libero_plus/benchmark /opt/venv/openpi/libero/libero/libero/benchmark

4. Weights Placement

Place all model weights in the weights/ directory with the following structure:

weights
|-- Pi05-LIBERO
|   |-- Sylvest
|   |   `-- ...
|   |-- config.json
|   |-- model.safetensors
|   `-- physical-intelligence
|       `-- ...
|-- Pi05-VGGT-LIBERO-FUSER-SFT_BF16
|   |-- Sylvest
|   |   `-- ...
|   |-- metadata.pt
|   |-- model.safetensors
|   |-- optimizer.pt
|   `-- physical-intelligence
|       `-- ...
`-- RLinf-Pi05-SFT
    |-- README.md
    |-- configuration.json
    |-- model.safetensors
    `-- physical-intelligence
        `-- ...

Training

Step 1: FUSER Adaptation Training on LIBERO

First, we perform FUSER adaptation training on LIBERO to obtain the base model Pi05-VGGT-LIBERO-FUSER-SFT_BF16 for subsequent RL training. This step can be completed using the official OpenPi framework.

Requirements:

Place srcs/openpi/models_pytorch in the OpenPi site-packages (as done in Deployment Step 1).
Download the official VGGT weights from https://github.com/facebookresearch/vggt.

We provide the necessary configuration and scripts at srcs/openpi:

config.py
train_fuser.py
train_fuser.sh

To start training:

bash train_fuser.sh

Step 2: RL Training

After obtaining the base model from Step 1, proceed with RL training.

Environment Switching

Use scripts/switch_libero.sh to switch between LIBERO and LIBERO-PLUS environments:

# Switch to libero_plus
bash scripts/switch_libero.sh libero_plus

# Switch back to libero
bash scripts/switch_libero.sh libero

Important: After switching environments, update the is_libero_plus field in examples/embodiment/config/env/libero_spatial.yaml accordingly.

Dataset Configuration

By default, we use a sparse spatial perturbation subset of LIBERO-PLUS for few-shot experiments.

To switch between subsets, full sets, or the complete LIBERO-PLUS dataset, modify the libero_task_map and task_num in:

/opt/venv/openpi/libero/libero/libero/benchmark/__init__.py

We provide scripts for subset selection and sparsification:

scripts/prepare_lp_sparse.py
scripts/prepare_lp_spatial.py

Start RL Training

Once the environment and dataset are configured:

bash examples/embodiment/run_embodiment.sh libero_spatial_ppo_openpi_pi05

Evaluation

The dataset setup and environment switching for evaluation are the same as for training.

Steps:

Specify the model weights to use in examples/embodiment/config/libero_spatial_ppo_openpi_pi05_eval.yaml.
Run evaluation:

# Single evaluation
bash examples/embodiment/eval_embodiment.sh libero_spatial_ppo_openpi_pi05_eval

# Batch evaluation across multiple checkpoints
python scripts/evaluate_across_steps.py

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

This project is built upon the RLinf framework. We thank the authors for their excellent work.

We also acknowledge the following projects that were instrumental to our research:

LIBERO and LIBERO-PLUS for providing comprehensive robotic manipulation benchmarks.
OpenPi for the foundational VLA training framework.
VGGT for the pre-trained visual-geometric features.

Name		Name	Last commit message	Last commit date
Latest commit History 208 Commits
.github		.github
docker		docker
docs		docs
examples		examples
ray_utils		ray_utils
requirements		requirements
rlinf		rlinf
scripts		scripts
srcs		srcs
tests		tests
toolkits		toolkits
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.RLinf.md		README.RLinf.md
README.md		README.md
README.zh-CN.md		README.zh-CN.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SA-VLA: Spatially-Aware Reinforcement Learning for Flow-Matching Vision-Language-Action Models

Method Overview

Prerequisites

RLinf Framework Setup

Additional Container Setup

Deployment

1. Model Code Deployment

2. LIBERO-PLUS Dataset Deployment

3. Benchmark Files Deployment

4. Weights Placement

Training

Step 1: FUSER Adaptation Training on LIBERO

Step 2: RL Training

Environment Switching

Dataset Configuration

Start RL Training

Evaluation

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

TwSphinx54/SA-VLA

Folders and files

Latest commit

History

Repository files navigation

SA-VLA: Spatially-Aware Reinforcement Learning for Flow-Matching Vision-Language-Action Models

Method Overview

Prerequisites

RLinf Framework Setup

Additional Container Setup

Deployment

1. Model Code Deployment

2. LIBERO-PLUS Dataset Deployment

3. Benchmark Files Deployment

4. Weights Placement

Training

Step 1: FUSER Adaptation Training on LIBERO

Step 2: RL Training

Environment Switching

Dataset Configuration

Start RL Training

Evaluation

License

Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages