WorldMem: Long-term Consistent World Simulation
with Memory

Zeqi Xiao¹ Yushi Lan¹ Yifan Zhou¹ Wenqi Ouyang¹ Shuai Yang² Yanhong Zeng³ Xingang Pan¹
¹S-Lab, Nanyang Technological University,
²Wangxuan Institute of Computer Technology, Peking University,
³Shanghai AI Laboratory

demo.1.1.mp4

Installation

conda create python=3.10 -n worldmem
conda activate worldmem
pip install -r requirements.txt
conda install -c conda-forge ffmpeg=4.3.2

Quick start

python app.py

Run

To enable cloud logging with Weights & Biases (wandb), follow these steps:

Sign up for a wandb account.
Run the following command to log in:
```
wandb login
```
Open configurations/training.yaml and set the entity and project field to your wandb username.

Training

Download pretrained weights from Oasis.

Training the model on 4 H100 GPUs, it converges after approximately 500K steps. We observe that gradually increasing task difficulty improves performance. Thus, we adopt a multi-stage training strategy: ,

sh train_stage_1.sh   # Small range, no vertical turning
sh train_stage_2.sh   # Large range, no vertical turning
sh train_stage_3.sh   # Large range, with vertical turning

To resume training from a previous checkpoint, configure the resume and output_dir variables in the corresponding .sh script.

Inference

To run inference:

sh infer.sh

You can either load the diffusion model and VAE separately:

+diffusion_model_path=zeqixiao/worldmem_checkpoints/diffusion_only.ckpt \
+vae_path=zeqixiao/worldmem_checkpoints/vae_only.ckpt \
+customized_load=true \
+seperate_load=true \

Or load a combined checkpoint:

+load=your_model_path \
+customized_load=true \
+seperate_load=false \

Evaluation

To run evaluation:

sh evaluate.sh

This script reproduces the results in Table 1 (beyond context window). It will generate PSNR and Lpips. Evaluating 1 case on 1 A100 GPU takes approximately 6 minutes. You can adjust experiment.test.limit_batch to specify the number of cases to evaluate.

Visual results will be saved by default to a timestamped directory (e.g., outputs/2025-11-30/00-02-42).

To calculate the FID score, run:

python calculate_fid.py --videos_dir <path_to_videos>

For example:

python calculate_fid.py --videos_dir outputs/2025-11-30/00-02-42/videos/test_vis

Expected Results:

Metric	Value
PSNR	24.01
LPIPS	0.1667
FID	15.13

Note: FID is computed over 5000 frames.

Dataset

Download the Minecraft dataset from Hugging Face

Place the dataset in the following directory structure:

data/
└── minecraft/
    ├── training/
    └── validation/
    └── test/

Data Generation

After setting up the environment as described in MineDojo's GitHub repository, you can generate data using the following command:

xvfb-run -a python data_generator.py -o data/test -z 4 --env_type plains

Parameters:

-o: Output directory for generated data
-z: Number of parallel workers
--env_type: Environment type (e.g., plains)

TODO

Release inference models and weights;
Release training pipeline on Minecraft;
Release training data on Minecraft;
Release evaluation scripts and data generator.

🔗 Citation

If you find our work helpful, please cite:

@misc{xiao2025worldmemlongtermconsistentworld,
      title={WORLDMEM: Long-term Consistent World Simulation with Memory}, 
      author={Zeqi Xiao and Yushi Lan and Yifan Zhou and Wenqi Ouyang and Shuai Yang and Yanhong Zeng and Xingang Pan},
      year={2025},
      eprint={2504.12369},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.12369}, 
}

👏 Acknowledgements

Diffusion Forcing: Diffusion Forcing provides flexible training and inference strategies for our methods.
Minedojo: We collect our Minecraft dataset from Minedojo.
Open-oasis: Our model architecture is based on Open-oasis. We also use pretrained VAE and DiT weight from it.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
algorithms		algorithms
assets		assets
configurations		configurations
datasets		datasets
experiments		experiments
utils		utils
.gitattributes		.gitattributes
LICENSE.md		LICENSE.md
README.md		README.md
app.py		app.py
calculate_fid.py		calculate_fid.py
data_generator.py		data_generator.py
evaluate.sh		evaluate.sh
infer.sh		infer.sh
main.py		main.py
requirements.txt		requirements.txt
train_stage_1.sh		train_stage_1.sh
train_stage_2.sh		train_stage_2.sh
train_stage_3.sh		train_stage_3.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WorldMem: Long-term Consistent World Simulation
with Memory

Installation

Quick start

Run

Training

Inference

Evaluation

Dataset

Data Generation

TODO

🔗 Citation

👏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

xizaoqu/WorldMem

Folders and files

Latest commit

History

Repository files navigation

WorldMem: Long-term Consistent World Simulation with Memory

Installation

Quick start

Run

Training

Inference

Evaluation

Dataset

Data Generation

TODO

🔗 Citation

👏 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

WorldMem: Long-term Consistent World Simulation
with Memory

Packages