ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations. (Oral Presentation @ CoRL 2025)

We provide code to train ReWiND reward models and policies on MetaWorld. The overall pipeline is as follows:

Train the ReWiND Reward Model on MetaWorld + OXE data.
Label the offline training dataset with the ReWiND Reward Model.
Train the ReWiND Policy with offline to online RL for new tasks.

Installation Instructions:

git clone [email protected]:Jiahui-3205/ReWiND_Release.git
cd ReWiND_Release/

Create Environment

# Run the setup script to create environment and install all dependencies
bash -i setup_ReWiND_env.sh
conda activate rewind

WandB Configuration

This project uses Weights & Biases (WandB) for experiment tracking. Before running experiments:

For Policy Training: Edit metaworld_policy_training/configs/base_config.yaml lines 15-16:
```
wandb_entity_name: your-wandb-entity
wandb_project_name: rewind-policy-training
```
To Disable WandB: Set logging.wandb=false when running policy training commands.

Data Preparation (We recommend to run it with the Default path)

Data Processing (Recommend run with Default path)

# Download preprocessed OpenX DinoV2 Embeddings
python download_data.py --download_path DOWNLOADPATH(Default:datasets)

Generate MetaWorld Trajectories for ReWiND Reward Training (Recommend run with Default path)

# Generate Metaworld trajectories
python data_generation/metaworld_generation.py --save_path SAVE_DATA_PATH(Default:datasets)
# Centercrop the videos and convert to DinoV2 features
python data_preprocessing/metaworld_center_crop.py --video_path SAVE_DATA_PATH(Default:datasets) --target_path TARGET_DATASET_PATH(Default:datasets)  
python data_preprocessing/generate_dino_embeddings.py --video_path_folder TARGET_DATASET_PATH(Default:datasets) --target_path EMBEDDING_TARGET_PATH(Default:datasets)

ReWiND Reward Model Training

# require wandb entity
python train_reward.py --wandb_entity YOUR_WANDB_ENTITY(Required) \
--wandb_project WANDB_Project_NAME(Default:rewind-reward-training) \
--rewind \
--subsample_video \
--clip_grad \
--cosine_scheduler \
--batch_size 1024 \
--worker 1

ReWiND Metaworld Policy Training

Label Offline Dataset (Recommend run with default path)

# Relabel the dataset we collect with ReWiND reward model
python data_preprocessing/metaworld_label_reward.py --reward_model_path CHECKPOINT_PATH --h5_video_path GENERATION_PATH --h5_embedding_path EMBEDDING_TARGET_PATH --output_path OUTPUT_PATH

Note:

OUTPUT_PATH: The labeled dataset file path (default: datasets/metaworld_labeled.h5). This will be used as <OUTPUT_PATH> in Offline Training and Online Training below.

cd metaworld_policy_training

Policy Offline to Online RL Training

python train_policy.py metaworld=off_on_15 \
algorithm=wsrl_iql \
reward=rewind_metaworld \
offline_training.offline_training_steps=15000 \
general_training.seed=42 \
environment.env_id=<ENV_ID> \
offline_training.offline_h5_path=<OUTPUT_PATH> \
reward_model.model_path=<CHECKPOINT_PATH>

<ENV_ID>: the Metaworld task you want to train online, e.g., button-press-wall-v2, window-close-v2. Full list of our (not in training data) evaluation tasks in the paper is: [window-close-v2, reach-wall-v2, faucet-close-v2, coffee-button-v2, button-press-wall-v2, door-lock-v2, handle-press-side-v2, sweep-into-v2]
<OFFLINE_CKPT_PATH>: path to your offline-trained checkpoint directory (often contains last_offline) to warm-start online training. If set to null, the run will first execute the offline phase for offline_training.offline_training_steps steps on the dataset, and then proceed to the online phase.
To skip offline learning entirely, set offline_training.offline_training_steps=0.

Optional: Policy Offline Training

We also provide code to just train the policy offline, so that you can load the same offline policy checkpoint for online RL to multiple new tasks downstream. You only need to set online_training.total_time_steps=0.

After offline training completes, check the model_dir in your wandb log to find the <OFFLINE_CKPT_PATH> for online training (see Online Training below).

Then, run the above offline to online RL training command with offline_training.ckpt_path=<OFFLINE_CKPT_PATH> as an extra argument to perform online RL directly with the same offline policy.

Note:

In offline training, environment.env_id is not important; the agent is trained over all training tasks found in your offline dataset.
<OUTPUT_PATH> should point to your labeled offline dataset (see Label Offline Dataset above).

FAQ & Debugging

Mujoco Installation

Download mujoco210 from mujoco-py installation guide
Extract the downloaded mujoco210 directory into ~/.mujoco/mujoco210

Add the following lines to ~/.bashrc:

export LD_LIBRARY_PATH=~/.mujoco/mujoco210/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia

Reload your shell configuration:
```
source ~/.bashrc
```

Debug

fatal error: GL/glew.h: No such file or directory 4 | #include <GL/glew.h>

Solution: check openai/mujoco-py#745

📄 Citation

  @inproceedings{
      zhang2025rewind,
      title={ReWi{ND}: Language-Guided Rewards Teach Robot Policies without New Demonstrations},
      author={Jiahui Zhang and Yusen Luo and Abrar Anwar and Sumedh Anand Sontakke and Joseph J Lim and Jesse Thomason and Erdem Biyik and Jesse Zhang},
      booktitle={9th Annual Conference on Robot Learning},
      year={2025},
      url={https://openreview.net/forum?id=XjjXLxfPou}
    }

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data_generation		data_generation
data_preprocessing		data_preprocessing
datasets		datasets
metaworld_policy_training		metaworld_policy_training
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
dataset.py		dataset.py
download_data.py		download_data.py
model.py		model.py
rewind.yml		rewind.yml
rewind_teaser.png		rewind_teaser.png
setup.py		setup.py
setup_ReWiND_env.sh		setup_ReWiND_env.sh
train_reward.py		train_reward.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations. (Oral Presentation @ CoRL 2025)

Installation Instructions:

Create Environment

WandB Configuration

Data Preparation (We recommend to run it with the Default path)

ReWiND Reward Model Training

ReWiND Metaworld Policy Training

Label Offline Dataset (Recommend run with default path)

Policy Offline to Online RL Training

Optional: Policy Offline Training

FAQ & Debugging

Mujoco Installation

Debug

📄 Citation

About

Uh oh!

Releases

Packages

Contributors 4

Languages

License

rewind-reward/ReWiND

Folders and files

Latest commit

History

Repository files navigation

ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations. (Oral Presentation @ CoRL 2025)

Installation Instructions:

Create Environment

WandB Configuration

Data Preparation (We recommend to run it with the Default path)

ReWiND Reward Model Training

ReWiND Metaworld Policy Training

Label Offline Dataset (Recommend run with default path)

Policy Offline to Online RL Training

Optional: Policy Offline Training

FAQ & Debugging

Mujoco Installation

Debug

📄 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages