Official code implementation of ICML 2025 paper: Reward-free World Models for Online Imitation Learning [Paper Link]
Shangzhe Li, Zhiao Huang, Hao Su
IQ-MPC is a world model designed for online imitation learning. It leverages the inverse soft-Q objective to train the critic, enabling effective policy learning from limited expert demonstrations and online reward-free interactions. Built upon the architecture of TD-MPC2, IQ-MPC excels in handling complex tasks such as dexterous hand manipulation and high-dimensional locomotion.
- Setup the environment using the following commands:
conda env create -f conda_env/environment.yaml
conda activate iqmpc
- Download the expert datasets here, which includes the expert datasets for 6 locomotion tasks and 3 dexterous hand manipulation tasks. All of the expert demonstrations are sampled from a trained single-task TD-MPC2 agent.
- Set the task in tdmpc2/config.json and the correct expert dataset path corresponding to the task.
- Run the training code:
python3 tdmpc2/train.py
This repository is created based on the original TD-MPC2 implementation repository: TD-MPC2 Official Implementation.
If you find our work helpful to your research, please consider citing our paper as follows:
@inproceedings{li2025reward,
title={Reward-free World Models for Online Imitation Learning},
author={Shangzhe Li and Zhiao Huang and Hao Su},
booktitle={International Conference on Machine Learning (ICML)},
year={2025}
}