Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ControlVLA/ControlVLA

Repository files navigation

ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models

[💻 Project page] [📄 Paper]

Puhao Li1,2, Yingying Wu1, Ziheng Xi1, Wanlin Li2, Yuzhe Huang2, Zhiyuan Zhang1, Yinghan Chen3, Jianan Wang4, Song-Chun Zhu1,2,3, Tengyu Liu2, ✉️, Siyuan Huang2, ✉️

1Tsinghua University, 2Beijing Institute for General Artificial Intelligence (BIGAI), 3Peking University, 4AstriBot.

Teaser ControlVLA is a general framework for few-shot object-centric adaptation for pre-trained VLA models. It can be used to adapt pre-trained VLA models to task- and environment-specific skills with only 10-20 expert demonstrations.

🛠️ Installation

  1. Create a virtual environment through conda or other python package managers.

    conda create -n controlvla python==3.9.18
    conda activate controlvla
  2. Install torch and other dependent libraries.

    pip install torch==2.1.0+cu121 torchvision==0.16.0+cu121 --extra-index-url https://download.pytorch.org/whl/cu121
    pip install -r requirements.txt
    
    ## install sam2 from the source code
    cd thirdparty
    git clone https://github.com/facebookresearch/sam2.git
    git checkout 7e1596c0b6462eb1d1ba7e1492430fed95023598
    ## remove the python and pytorch version restrictions in sam2 setup config
    cd sam2 && pip install -e .
    • The code is tested on pytorch 2.1.0 and cuda 12.1, other versions may have compatibility issues.
  3. Download the pre-trained model:

    • Pre-trained ControlVLA model here, unzip and place it in the ./data folder.
    • SAM2 model following the instructions here and place it in the ./data/checkpoints folder. Note that the default config uses checkpoint sam2_hiera_tiny.pt. You can simply download the default checkpoint with wget: cd data/checkpoints && wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt.

🗃️ Data Collection

Note that our data is collected and pre-processed with UMI system, which provides robot-arm-agnostic training data. You may also collect data with your own robot setup for faster validation and deployment.

  1. Collect data with UMI data pipeline system, and get the replay_buffer.zarr.zip data file. An example of this zarr data file is provided here.

  2. Annotate the interactive parts for each object for SAM2.

    python scripts_objcen_pipeline/prompts_annotation.py -i ./example_finetune_demo/picknplace_toy.d10
    python scripts_objcen_pipeline/prompts_extraction.py -i picknplace_toy.d10

    You can also use GroundingDINO to automatically annotate the interactive parts with task language instructions.

  3. Process and integrate the object-centric masks into the data file.

    python scripts_objcen_pipeline/08_propagate_interactive_parts.py -i ./example_finetune_demo/picknplace_toy.d10
    python scripts_objcen_pipeline/09_integrate_into_dataset.py -i ./example_finetune_demo/picknplace_toy.d10

🦾 Fine-tuning and Deployment

Finetune the pre-trained ControlVLA model on example dataset with the following command:

bash runs/controlvla_pnptoy.sh

For real-world deployment, customize your robot and camera interface for the inference script eval_controlvla.py. Then run:

python eval_controlvla.py -i ./data/checkpoints/latest.ckpt -p ./example_finetune_demo/picknplace_toy.d10/picknplace_toy.d10.objectcentric.anno.pkl

👏 Acknowledgments

We thank Yuyang Li, Yuwei Guo, and Ziyuan Jiao for their valuable discussions and technical support. This work builds upon the codebase of UMI.

🔗 Citation

If you find this work useful, please consider citing:

@article{li2025controlvla,
  title={ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models},
  author={Li, Puhao and Wu, Yingying and Xi, Ziheng and Li, Wanlin and Huang, Yuzhe and Zhang, Zhiyuan and Chen, Yinghan and Wang, Jianan and Zhu, Song-Chun and Liu, Tengyu and others},
  journal={arXiv preprint arXiv:2506.16211},
  year={2025}
}

If you have any questions about this work, feel free to contact Puhao Li at [email protected]

About

Code Repository for ControlVLA, CoRL2025.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages