Thanks to visit codestin.com
Credit goes to github.com

Skip to content

dexmal/dexbotic

Repository files navigation

Dexbotic Logo

One-Stop VLA Development Toolbox for Embodied Intelligence

Paper Hugging Face Documentation License Chinese

Pretraining · Fine-tuning · Inference · Evaluation
Supporting mainstream policies such as π0, CogACT, OFT, MemVLA, and more

Introduction

Dexbotic is a VLA (Vision-Language-Action) development toolbox built on the PyTorch framework, designed to provide a unified and efficient solution for embodied intelligence research. It comes with built-in environment configurations for various mainstream VLA models, allowing users to reproduce, fine-tune, and inference cutting-edge VLA algorithms with simple setup.

  • Ready-to-Use VLA Framework: Centered around VLA models, integrating embodied manipulation and navigation capabilities, supporting multiple cutting-edge algorithms.
  • High-Performance Pre-trained Foundation Models: For mainstream VLA algorithms such as π0 and CogACT, Dexbotic provides multiple optimized pre-trained models.
  • Modular Development Architecture: Adopting a "layered configuration + factory registration + entry dispatch" architecture, users can easily modify configurations, change models, or add tasks by simply modifying experimental scripts.
  • Unified Cloud and Local Training: Fully supports both cloud and local training needs, supporting cloud training platforms such as Alibaba Cloud and Volcano Engine, while also accommodating consumer-grade GPUs for local training.
  • Extensive Robot Compatibility: For mainstream robots such as UR5, Franka, and ALOHA, Dexbotic provides a unified training data format and deployment scripts.

🔥 News

  • [2026-01-15] Released a tutorial on integrating SO-101 with Dexbotic.
  • [2026-01-15] Supported SimpleVLA-RL
  • [2026-01-15] Supported NaVILA.
  • [2026-01-08] Added Co-training capability, enabling joint optimization of action experts and LLMs for the CogACT model.
  • [2026-01-08] Released a specialized image compatible with Blackwell GPUs.
  • [2025-12-29] Supported OFT and Pi0.5 models.
  • [2025-10-20] Dexbotic officially released! Check out the technical report and official documentation for details.

Quick Start

We strongly recommend using Docker for development or deployment to get the best experience.

1. Installation and Environment Setup

# 1. Clone the repository
git clone https://github.com/dexmal/dexbotic.git

# 2. Start Docker container
docker run -it --rm --gpus all --network host \
  -v $(pwd)/dexbotic:/dexbotic \
  dexmal/dexbotic \
  bash

# 3. Activate environment and install dependencies
cd /dexbotic
conda activate dexbotic
pip install -e .

System Requirements: Ubuntu 20.04/22.04, recommended GPUs: RTX 4090, A100, or H100 (8 GPUs recommended for training, 1 GPU for deployment).

Using on Blackwell GPUs

For users with Blackwell architecture GPUs (e.g., B100, RTX 5090), please use the specialized Docker image dexmal/dexbotic:c130t28.

# 1. Start Docker with Blackwell image
docker run -it --rm --gpus all --network host \
  -v /path/to/dexbotic:/dexbotic \
  dexmal/dexbotic:c130t28 \
  bash

# 2. Activate environment
cd /dexbotic
pip install -e .

2. Usage Guide

Benchmark Results

The following shows a comparison of evaluation results between models trained with Dexbotic and original models on mainstream simulation environments. View more detailed evaluation results: Benchmark Results

Libero

Model Average Libero-Spatial Libero-Object Libero-Goal Libero-10
CogACT 93.6 97.2 98.0 90.2 88.8
DB-CogACT 94.9 93.8 97.8 96.2 91.8
π0 94.2 96.8 98.8 95.8 85.2
DB-π0 93.9 97 98.2 94 86.4
MemVLA 96.7 98.4 98.4 96.4 93.4
DB-MemVLA 97.0 97.2 99.2 98.4 93.2

CALVIN

Model Average Length 1 2 3 4 5
CogACT 3.246 83.8 72.9 64 55.9 48
DB-CogACT 4.063 93.5 86.7 80.3 76 69.8
OFT 3.472 89.1 79.4 67.4 59.8 51.5
DB-OFT 3.540 92.8 80.7 69.2 60.2 51.1

SimplerEnv

Model Average Spoon Carrot Stack Blocks Eggplant
CogACT 51.25 71.7 50.8 15 67.5
DB-CogACT 69.45 87.5 65.28 29.17 95.83
OFT 30.23 12.5 4.2 4.2 100
DB-OFT 76.39 91.67 76.39 43.06 94.44
MemVLA 71.9 75.0 75.0 37.5 100.0
DB-MemVLA 84.4 100.0 66.7 70.8 100.0

ManiSkill2

Model Average PickCube StackCube PickSingleYCB PickSingleEGAD PickClutterYCB
CogACT 40 55 70 30 25 20
DB-CogACT 58 90 65 65 40 30
OFT 21 40 45 5 5 0
DB-OFT 63 90 75 55 65 30
π0 66 95 85 55 85 10
DB-π0 65 95 85 65 50 30

RoboTwin2.0

Model Average Adjust Bottle Grab Roller Place Empty Cup Place Phone Stand
CogACT 43.8 87 72 11 5
DB-CogACT 58.5 99 89 28 18

FAQ

Q: Failed to install Flash-Attention

A: For detailed installation instructions and troubleshooting, please refer to the official documentation at https://github.com/Dao-AILab/flash-attention.

Q: Coverting RLDS/LeRobot to Dexdata

A: We provide a general data conversion guide in data conversion. An example of Lerobot data conversion can be found in convert_lerobot_to_dexdata, and an example for RLDS data conversion is available in convert_rlds_to_dexdata.

Q: Is 5090 supported?

A: Yes, please refer to Using on Blackwell GPUs.

Support Us

We are continuously improving, with more features coming soon. If you like this project, please give us a star on GitHub GitHub. Your support is our motivation to keep moving forward!

If Dexbotic has been helpful in your research work, please consider citing our technical report:

@article{dexbotic,
  title={Dexbotic: Open-Source Vision-Language-Action Toolbox},
  author={Dexbotic Contributors},
  journal={arXiv preprint arXiv:2510.23511},
  year={2025}
}

License

This project is licensed under the MIT License.

Packages

No packages published

Contributors 5

Languages