Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline Reinforcement Learning (NeurIPS 2023)
TSRL (https://arxiv.org/abs/2306.04220) introduces a new offline reinforcement learning (RL) algorithm that leverages the fundamental symmetry of system dynamics to enhance performance under small datasets. The proposed Time-reversal symmetry (T-symmetry) enforced Dynamics Model (TDM) establishes consistency between forward and reverse latent dynamics, providing well-behaved representations for small datasets. TSRL achieves impressive performance on small benchmark datasets with as few as 1% of the original samples, outperforming recent offline RL algorithms in terms of data efficiency and generalizability.
To install the dependencies, use
pip install -r requirements.txtYou can download the small samples from the ''utils/small_samples/'' directly.
Or if you want to generate by yourself:
bash utils/generate_loco.sh # For the locomotion tasksand
bash utils/generate_adroit.sh # For the adroit tasksYou can train TDM simply from:
bash TDM/train_loco.sh # For the locomotion tasks and
bash TDM/train_adroit.sh # For the adroit tasksAfter you have your own small samples as well as a trained TDM model, you can run TSRL on D4RL tasks by:
bash tsrl_loco.sh # For the locomotion tasks and
bash tsrl_adroit.sh # For the locomotion tasks You can resort to wandb to login your personal account via export your own wandb api key.
export WANDB_API_KEY=YOUR_WANDB_API_KEY
and run
wandb online
to turn on the online syncronization.