Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View ControlVLA's full-sized avatar

Block or report ControlVLA

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ControlVLA/README.md

ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models

[πŸ’» Project page] [πŸ“„ Paper]

Puhao Li1,2, Yingying Wu1, Ziheng Xi1, Wanlin Li2, Yuzhe Huang2, Zhiyuan Zhang1, Yinghan Chen3, Jianan Wang4, Song-Chun Zhu1,2,3, Tengyu Liu2, βœ‰οΈ, Siyuan Huang2, βœ‰οΈ

1Tsinghua University, 2Beijing Institute for General Artificial Intelligence (BIGAI), 3Peking University, 4AstriBot.

Teaser ControlVLA is a general framework for few-shot object-centric adaptation for pre-trained VLA models. It can be used to adapt pre-trained VLA models to task- and environment-specific skills with only 10-20 expert demonstrations.

πŸ› οΈ Installation

  1. Create a virtual environment through conda or other python package managers.

    conda create -n controlvla python==3.9.18
    conda activate controlvla
  2. Install torch and other dependent libraries.

    pip install torch==2.1.0+cu121 torchvision==0.16.0+cu121 --extra-index-url https://download.pytorch.org/whl/cu121
    pip install -r requirements.txt
    
    ## install sam2 from the source code
    cd thirdparty
    git clone https://github.com/facebookresearch/sam2.git
    git checkout 7e1596c0b6462eb1d1ba7e1492430fed95023598
    ## remove the python and pytorch version restrictions in sam2 setup config
    cd sam2 && pip install -e .
    • The code is tested on pytorch 2.1.0 and cuda 12.1, other versions may have compatibility issues.
  3. Download the pre-trained model:

    • Pre-trained ControlVLA model here, unzip and place it in the ./data folder.
    • SAM2 model following the instructions here and place it in the ./data/checkpoints folder. Note that the default config uses checkpoint sam2_hiera_tiny.pt. You can simply download the default checkpoint with wget: cd data/checkpoints && wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt.

πŸ—ƒοΈ Data Collection

Note that our data is collected and pre-processed with UMI system, which provides robot-arm-agnostic training data. You may also collect data with your own robot setup for faster validation and deployment.

  1. Collect data with UMI data pipeline system, and get the replay_buffer.zarr.zip data file. An example of this zarr data file is provided here.

  2. Annotate the interactive parts for each object for SAM2.

    python scripts_objcen_pipeline/prompts_annotation.py -i ./example_finetune_demo/picknplace_toy.d10
    python scripts_objcen_pipeline/prompts_extraction.py -i picknplace_toy.d10

    You can also use GroundingDINO to automatically annotate the interactive parts with task language instructions.

  3. Process and integrate the object-centric masks into the data file.

    python scripts_objcen_pipeline/08_propagate_interactive_parts.py -i ./example_finetune_demo/picknplace_toy.d10
    python scripts_objcen_pipeline/09_integrate_into_dataset.py -i ./example_finetune_demo/picknplace_toy.d10

🦾 Fine-tuning and Deployment

Finetune the pre-trained ControlVLA model on example dataset with the following command:

bash runs/controlvla_pnptoy.sh

For real-world deployment, customize your robot and camera interface for the inference script eval_controlvla.py. Then run:

python eval_controlvla.py -i ./data/checkpoints/latest.ckpt -p ./example_finetune_demo/picknplace_toy.d10/picknplace_toy.d10.objectcentric.anno.pkl

πŸ‘ Acknowledgments

We thank Yuyang Li, Yuwei Guo, and Ziyuan Jiao for their valuable discussions and technical support. This work builds upon the codebase of UMI.

πŸ”— Citation

If you find this work useful, please consider citing:

@article{li2025controlvla,
  title={ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models},
  author={Li, Puhao and Wu, Yingying and Xi, Ziheng and Li, Wanlin and Huang, Yuzhe and Zhang, Zhiyuan and Chen, Yinghan and Wang, Jianan and Zhu, Song-Chun and Liu, Tengyu and others},
  journal={arXiv preprint arXiv:2506.16211},
  year={2025}
}

If you have any questions about this work, feel free to contact Puhao Li at [email protected]

Popular repositories Loading

  1. ControlVLA ControlVLA Public

    Code Repository for ControlVLA, CoRL2025.

    Python 81 4

  2. controlvla.github.io controlvla.github.io Public

    JavaScript