Thanks to visit codestin.com
Credit goes to github.com

Skip to content
forked from RLinf/RLinf

Spatially-Aware Reinforcement Learning for Flow-Matching Vision-Language-Action Models

License

Notifications You must be signed in to change notification settings

TwSphinx54/SA-VLA

 
 

Repository files navigation

SA-VLA: Spatially-Aware Reinforcement Learning for Flow-Matching Vision-Language-Action Models

This repository contains the source code for the paper SA-VLA: Spatially-Aware Reinforcement Learning for Flow-Matching Vision-Language-Action Models.

SA-VLA is developed on top of the RLinf reinforcement learning framework.


SA-VLA Introduction

SA-VLA introduces a spatially-aware reinforcement learning approach for training flow-matching vision-language-action models. By incorporating spatial reasoning and RL fine-tuning, our method significantly improves generalization and task performance in embodied AI scenarios.

Method Overview

SA-VLA Overview

Overview of SA-VLA. Visual and spatial tokens are fused into geometry-aware embeddings, which are optimized via step-level dense rewards and spatially-conditioned exploration (SCAN) for robust RL adaptation.


Prerequisites

RLinf Framework Setup

Please follow the instructions in README.RLinf.md to configure the RLinf framework.

Important: When pulling the RLinf source code inside the container, replace it with this repository instead:

git clone https://github.com/TwSphinx54/SA-VLA.git
cd SA-VLA

Additional Container Setup

After the container is set up, run the additional setup script:

bash scripts/setup_container.sh

Deployment

1. Model Code Deployment

Copy the custom model code to the OpenPi site-packages:

cp -r srcs/openpi/models_pytorch /opt/venv/openpi/lib/python3.11/site-packages/openpi/models_pytorch

2. LIBERO-PLUS Dataset Deployment

Deploy the LIBERO-PLUS dataset from LIBERO-plus to /opt/venv/openpi/ and rename the directory to libero_plus:

mv /opt/venv/openpi/LIBERO-plus /opt/venv/openpi/libero_plus

3. Benchmark Files Deployment

Copy the provided benchmark files to the LIBERO library:

cp -r srcs/libero_plus/benchmark /opt/venv/openpi/libero/libero/libero/benchmark

4. Weights Placement

Place all model weights in the weights/ directory with the following structure:

weights
|-- Pi05-LIBERO
|   |-- Sylvest
|   |   `-- ...
|   |-- config.json
|   |-- model.safetensors
|   `-- physical-intelligence
|       `-- ...
|-- Pi05-VGGT-LIBERO-FUSER-SFT_BF16
|   |-- Sylvest
|   |   `-- ...
|   |-- metadata.pt
|   |-- model.safetensors
|   |-- optimizer.pt
|   `-- physical-intelligence
|       `-- ...
`-- RLinf-Pi05-SFT
    |-- README.md
    |-- configuration.json
    |-- model.safetensors
    `-- physical-intelligence
        `-- ...

Training

Step 1: FUSER Adaptation Training on LIBERO

First, we perform FUSER adaptation training on LIBERO to obtain the base model Pi05-VGGT-LIBERO-FUSER-SFT_BF16 for subsequent RL training. This step can be completed using the official OpenPi framework.

Requirements:

We provide the necessary configuration and scripts at srcs/openpi:

  • config.py
  • train_fuser.py
  • train_fuser.sh

To start training:

bash train_fuser.sh

Step 2: RL Training

After obtaining the base model from Step 1, proceed with RL training.

Environment Switching

Use scripts/switch_libero.sh to switch between LIBERO and LIBERO-PLUS environments:

# Switch to libero_plus
bash scripts/switch_libero.sh libero_plus

# Switch back to libero
bash scripts/switch_libero.sh libero

Important: After switching environments, update the is_libero_plus field in examples/embodiment/config/env/libero_spatial.yaml accordingly.

Dataset Configuration

By default, we use a sparse spatial perturbation subset of LIBERO-PLUS for few-shot experiments.

To switch between subsets, full sets, or the complete LIBERO-PLUS dataset, modify the libero_task_map and task_num in:

/opt/venv/openpi/libero/libero/libero/benchmark/__init__.py

We provide scripts for subset selection and sparsification:

  • scripts/prepare_lp_sparse.py
  • scripts/prepare_lp_spatial.py

Start RL Training

Once the environment and dataset are configured:

bash examples/embodiment/run_embodiment.sh libero_spatial_ppo_openpi_pi05

Evaluation

The dataset setup and environment switching for evaluation are the same as for training.

Steps:

  1. Specify the model weights to use in examples/embodiment/config/libero_spatial_ppo_openpi_pi05_eval.yaml.
  2. Run evaluation:
# Single evaluation
bash examples/embodiment/eval_embodiment.sh libero_spatial_ppo_openpi_pi05_eval

# Batch evaluation across multiple checkpoints
python scripts/evaluate_across_steps.py

License

This project is licensed under the MIT License - see the LICENSE file for details.


Acknowledgments

This project is built upon the RLinf framework. We thank the authors for their excellent work.

We also acknowledge the following projects that were instrumental to our research:

  • LIBERO and LIBERO-PLUS for providing comprehensive robotic manipulation benchmarks.
  • OpenPi for the foundational VLA training framework.
  • VGGT for the pre-trained visual-geometric features.

About

Spatially-Aware Reinforcement Learning for Flow-Matching Vision-Language-Action Models

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.1%
  • Other 0.9%