This repository is the official implementation of the paper "Mastering Multi-Drone Volleyball through Hierarchical Co-Self-Play Reinforcement Learning".
[ Paper ] [ Project Website ]
Ruize Zhang, Sirui Xiang, Zelai Xu, Feng Gao, Shilong Ji, Wenhao Tang, Wenbo Ding, Chao Yu+, Yu Wang+
Overview | Installation | Usage | Citation | Acknowledgement
We tackle the problem of learning to play 3v3 multi-drone volleyball, a new embodied competitive task that requires both high-level strategic coordination and low-level agile control. To address this, we propose Hierarchical Co-Self-Play (HCSP), a hierarchical reinforcement learning framework that separates centralized high-level strategic decision-making from decentralized low-level motion control. We design a three-stage population-based training pipeline to enable both strategy and skill to emerge from scratch without expert demonstrations.
Our simulation environment is based on Isaac Sim, available for local installation.
Download the Omniverse Isaac Sim and install the desired Isaac Sim release following the official document.
For this repository, we use the Isaac Sim version 2023.1.0-hotfix.1. Since this version is not available on the official website, you can download it directly from here. After downloading, extract the file and move the folder to the following directory:
mv isaac_sim-2023.1.0-hotfix.1 ~/.local/share/ov/pkg/Set the following environment variables to your ~/.bashrc or ~/.zshrc files :
# Isaac Sim root directory
export ISAACSIM_PATH="${HOME}/.local/share/ov/pkg/isaac_sim-2023.1.0-hotfix.1"After adding the environment variable, apply the changes by running:
source ~/.bashrcAlthough Isaac Sim comes with a built-in Python environment, we recommend using a seperate conda environment which is more flexible. We provide scripts to automate environment setup when activating/deactivating a conda environment at HCSP/conda_setup.
conda create -n hcsp python=3.10
conda activate hcsp
# at HCSP/
cp -r conda_setup/etc $CONDA_PREFIX
# re-activate the environment
conda activate hcsp
# install HCSP
pip install -e .
# verification
python -c "from omni.isaac.kit import SimulationApp"
# which torch is being used
python -c "import torch; print(torch.__path__)"
HCSP requires specific versions of the tensordict, torchrl and orbit packages. For this repository, we manage these three packages using Git submodules to ensure that the correct versions are used. To initialize and update the submodules, follow these steps:
Get the submodules:
# at HCSP/
git submodule update --init --recursiveInstall tensordict:
# at HCSP/
cd third-party/tensordict
python setup.py developBefore install torchrl, first check and update gcc and g++:
# check gcc version, should be like: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 ...
gcc --version
# if not gcc 9.x, check available gcc
ls /usr/bin/gcc*
# if gcc-9 is not installed
sudo apt update && sudo apt install gcc-9
# if gcc-9 is installed
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 100
sudo update-alternatives --config gcc
# then follow instructions to select gcc-9.
# check gcc version again
gcc --version
# apply same update and config to g++Then install torchrl:
# at HCSP/
cd third-party/rl
python setup.py developAlso we need to install orbit package of Isaac Sim. Note that currently orbit package has been integrated into Isaac Lab package, but this branch still uses the standalone orbit package (this can be updated in the future). So we manage the older version of orbit package using Git submodule. To install the orbit package, follow these steps:
# at HCSP/
cd third-party/orbit
# create a symbolic link
ln -s ${ISAACSIM_PATH} _isaac_sim
# create environment variable
echo -e "alias orbit=$(pwd)/orbit.sh" >> ${HOME}/.bashrc
source ${HOME}/.bashrc
# building extentions
sudo apt install cmake build-essential
./orbit.sh --install # or "./orbit.sh -i"# at HCSP/
cd scripts
python train.py headless=true wandb.mode=disabled
We use hydra to manage the configurations. You can find the configuration files in cfg.
File structure of the configuration files:
cfg
├── train_<TASKNAME>.yaml # training configurations
├── task
│ ├── <TASKNAME>.yaml # configurations for each task
├── algo
│ ├── <ALGONAME>.yaml # configurations for each algorithmWhen you run the training script, you can modify the configurations in these yaml files, or you can pass the configurations as command line arguments. For example:
python train.py headless=true \
total_frames=1000000000 \
task=Serve \
task.drone_model=Iris \
task.env.num_envs=4096 \
task.ball_mass=0.005 \
task.ball_radius=0.1 \
eval_interval=100 \
save_interval=100 \
only_eval=falseFor experiment tracking, we use wandb. You need to have a wandb account, and set the wandb configurations in the train.yaml file. For example:
wandb:
group: <EXPERIMENT_GROUP>
run_name: <EXPERIMENT_NAME>
job_type: train
entity: <YOUR_WANDB_ENTITY>
project: <YOUR_WANDB_PROJECT>
mode: online # set to 'disabled' when debugging locally
run_id:
monitor_gym: True
tags:We provide training scripts for all tasks presented in our paper, including: (i) 11 low-level skills, (ii) High-level task, and (iii) Co-Self-Play. You can find them in the scripts/shell directory. For example, to train a model for the Serve task, you can run:
# at HCSP/scripts/shell
bash serve.shPlease cite our paper if you find our paper useful:
@InProceedings{pmlr-v305-zhang25n,
title={Mastering Multi-Drone Volleyball through Hierarchical Co-Self-Play Reinforcement Learning},
author={Zhang, Ruize and Xiang, Sirui and Xu, Zelai and Gao, Feng and Ji, Shilong and Tang, Wenhao and Ding, Wenbo and Yu, Chao and Wang, Yu},
booktitle={Proceedings of The 9th Conference on Robot Learning},
pages={5278--5300},
year={2025},
editor={Lim, Joseph and Song, Shuran and Park, Hae-Won},
volume={305},
series={Proceedings of Machine Learning Research},
month={27--30 Sep},
publisher={PMLR},
pdf={https://raw.githubusercontent.com/mlresearch/v305/main/assets/zhang25n/zhang25n.pdf},
url={https://proceedings.mlr.press/v305/zhang25n.html}
}This repository is heavily based on OmniDrones.
Some of the abstractions and implementation was heavily inspired by Isaac Orbit.