Pretraining · Fine-tuning · Inference · Evaluation
Supporting mainstream policies such as π0, CogACT, OFT, MemVLA, and more
Dexbotic is a VLA (Vision-Language-Action) development toolbox built on the PyTorch framework, designed to provide a unified and efficient solution for embodied intelligence research. It comes with built-in environment configurations for various mainstream VLA models, allowing users to reproduce, fine-tune, and inference cutting-edge VLA algorithms with simple setup.
- Ready-to-Use VLA Framework: Centered around VLA models, integrating embodied manipulation and navigation capabilities, supporting multiple cutting-edge algorithms.
- High-Performance Pre-trained Foundation Models: For mainstream VLA algorithms such as π0 and CogACT, Dexbotic provides multiple optimized pre-trained models.
- Modular Development Architecture: Adopting a "layered configuration + factory registration + entry dispatch" architecture, users can easily modify configurations, change models, or add tasks by simply modifying experimental scripts.
- Unified Cloud and Local Training: Fully supports both cloud and local training needs, supporting cloud training platforms such as Alibaba Cloud and Volcano Engine, while also accommodating consumer-grade GPUs for local training.
- Extensive Robot Compatibility: For mainstream robots such as UR5, Franka, and ALOHA, Dexbotic provides a unified training data format and deployment scripts.
- [2026-01-15] Released a tutorial on integrating SO-101 with Dexbotic.
- [2026-01-15] Supported SimpleVLA-RL
- [2026-01-15] Supported NaVILA.
- [2026-01-08] Added Co-training capability, enabling joint optimization of action experts and LLMs for the CogACT model.
- [2026-01-08] Released a specialized image compatible with Blackwell GPUs.
- [2025-12-29] Supported OFT and Pi0.5 models.
- [2025-10-20] Dexbotic officially released! Check out the technical report and official documentation for details.
We strongly recommend using Docker for development or deployment to get the best experience.
# 1. Clone the repository
git clone https://github.com/dexmal/dexbotic.git
# 2. Start Docker container
docker run -it --rm --gpus all --network host \
-v $(pwd)/dexbotic:/dexbotic \
dexmal/dexbotic \
bash
# 3. Activate environment and install dependencies
cd /dexbotic
conda activate dexbotic
pip install -e .System Requirements: Ubuntu 20.04/22.04, recommended GPUs: RTX 4090, A100, or H100 (8 GPUs recommended for training, 1 GPU for deployment).
Using on Blackwell GPUs
For users with Blackwell architecture GPUs (e.g., B100, RTX 5090), please use the specialized Docker image dexmal/dexbotic:c130t28.
# 1. Start Docker with Blackwell image
docker run -it --rm --gpus all --network host \
-v /path/to/dexbotic:/dexbotic \
dexmal/dexbotic:c130t28 \
bash
# 2. Activate environment
cd /dexbotic
pip install -e .The following shows a comparison of evaluation results between models trained with Dexbotic and original models on mainstream simulation environments. View more detailed evaluation results: Benchmark Results
| Model | Average | Libero-Spatial | Libero-Object | Libero-Goal | Libero-10 |
|---|---|---|---|---|---|
| CogACT | 93.6 | 97.2 | 98.0 | 90.2 | 88.8 |
| DB-CogACT | 94.9 | 93.8 | 97.8 | 96.2 | 91.8 |
| π0 | 94.2 | 96.8 | 98.8 | 95.8 | 85.2 |
| DB-π0 | 93.9 | 97 | 98.2 | 94 | 86.4 |
| MemVLA | 96.7 | 98.4 | 98.4 | 96.4 | 93.4 |
| DB-MemVLA | 97.0 | 97.2 | 99.2 | 98.4 | 93.2 |
| Model | Average Length | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|---|
| CogACT | 3.246 | 83.8 | 72.9 | 64 | 55.9 | 48 |
| DB-CogACT | 4.063 | 93.5 | 86.7 | 80.3 | 76 | 69.8 |
| OFT | 3.472 | 89.1 | 79.4 | 67.4 | 59.8 | 51.5 |
| DB-OFT | 3.540 | 92.8 | 80.7 | 69.2 | 60.2 | 51.1 |
| Model | Average | Spoon | Carrot | Stack Blocks | Eggplant |
|---|---|---|---|---|---|
| CogACT | 51.25 | 71.7 | 50.8 | 15 | 67.5 |
| DB-CogACT | 69.45 | 87.5 | 65.28 | 29.17 | 95.83 |
| OFT | 30.23 | 12.5 | 4.2 | 4.2 | 100 |
| DB-OFT | 76.39 | 91.67 | 76.39 | 43.06 | 94.44 |
| MemVLA | 71.9 | 75.0 | 75.0 | 37.5 | 100.0 |
| DB-MemVLA | 84.4 | 100.0 | 66.7 | 70.8 | 100.0 |
| Model | Average | PickCube | StackCube | PickSingleYCB | PickSingleEGAD | PickClutterYCB |
|---|---|---|---|---|---|---|
| CogACT | 40 | 55 | 70 | 30 | 25 | 20 |
| DB-CogACT | 58 | 90 | 65 | 65 | 40 | 30 |
| OFT | 21 | 40 | 45 | 5 | 5 | 0 |
| DB-OFT | 63 | 90 | 75 | 55 | 65 | 30 |
| π0 | 66 | 95 | 85 | 55 | 85 | 10 |
| DB-π0 | 65 | 95 | 85 | 65 | 50 | 30 |
| Model | Average | Adjust Bottle | Grab Roller | Place Empty Cup | Place Phone Stand |
|---|---|---|---|---|---|
| CogACT | 43.8 | 87 | 72 | 11 | 5 |
| DB-CogACT | 58.5 | 99 | 89 | 28 | 18 |
Q: Failed to install Flash-Attention
A: For detailed installation instructions and troubleshooting, please refer to the official documentation at https://github.com/Dao-AILab/flash-attention.
Q: Coverting RLDS/LeRobot to Dexdata
A: We provide a general data conversion guide in data conversion. An example of Lerobot data conversion can be found in convert_lerobot_to_dexdata, and an example for RLDS data conversion is available in convert_rlds_to_dexdata.
Q: Is 5090 supported?
A: Yes, please refer to Using on Blackwell GPUs.
We are continuously improving, with more features coming soon. If you like this project, please give us a star on GitHub . Your support is our motivation to keep moving forward!
If Dexbotic has been helpful in your research work, please consider citing our technical report:
@article{dexbotic,
title={Dexbotic: Open-Source Vision-Language-Action Toolbox},
author={Dexbotic Contributors},
journal={arXiv preprint arXiv:2510.23511},
year={2025}
}This project is licensed under the MIT License.