MMPL: Macro-from-Micro Planning for High-Quality and Parallelized Autoregressive Long Video Generation

Xunzhi Xiang, Yabo Chen, Guiyu Zhang, Zhongyu Wang, Zhe Gao, Quanming Xiang, Gonghu Shang, Junqi Liu, Haibin Huang, Yang Gao, Chi Zhang, Qi Fan, Xuelong Li

📌 Release Timeline & TODOs

Paper release – Publicly available on arXiv ✅ (2025-08-05)
Demo page release – Launch interactive demo page ✅ (2025-08-05)
14B TF Image-to-video inference code release – ✅ (2025-10-21)
14B TF Text-to-video inference code release – ✅ (2025-10-21)
Training code release – Coming soon (ETA: in several weeks)
Data release – Coming soon (ETA: in several weeks)

💡 Code, models, and dataset will be released in several weeks.

Requirements

We tested this repo on the following setup:

Nvidia GPU with at least 32 GB memory for 1.3B models(RTX 4090, A100, and H100 are tested).
Nvidia GPU with at least 80 GB memory for 14B models(A100, and H100 are tested).
Linux operating system.
64 GB RAM.

Other hardware setup could also work but hasn't been tested.

Installation

Create a conda environment and install dependencies:

conda create -n MMPL python=3.10 -y
conda activate MMPL
pip install -r requirements.txt

git clone https://github.com/modelscope/DiffSynth-Studio.git  
cd DiffSynth-Studio
pip install -e .

pip install flash-attn --no-build-isolation

python setup.py develop

Quick Start

Download checkpoints

huggingface-cli download Wan-AI/Wan2.1-T2V-14B --local-dir .
huggingface-cli download Tele-AI/MMPL --local-dir .

After the downloads complete, your directory layout should be:

├── MMPL_i2v/
│   └── pretrained_models/
│       ├── i2v_14B_6k.pt
├── MMPL_i2v/
│   └── pretrained_models/
│       ├── t2v_14B_8k.pt
├── wan_models/
│   └── Wan2.1-T2V-1.3B
│   └── Wan2.1-T2V-14B
├── README.md
├── demo.png       
└── ...

T2V Inference

Example T2V inference script using the chunk-wise autoregressive checkpoint trained with Teacher-Forcing methods:

cd MMPL_t2v

# Single-GPU, quick validation
bash Wan_t2v_1gpu.bash

# Multi-GPU (4× GPUs), ~20s video with parallel chunking
bash Wan_t2v_4gpu_20s.bash

I2V Inference

Example I2V inference script using the chunk-wise autoregressive checkpoint trained with Teacher-Forcing methods:

cd MMPL_i2v

# Single-GPU, quick validation
bash Wan_i2v_1gpu.bash

# Multi-GPU (4× GPUs), ~20s video with parallel chunking
bash Wan_i2v_4gpu_20s.bash

Other config files and corresponding checkpoints can be found in configs folder.

Acknowledgements

This codebase is built on top of the open-source implementations of:

Citation

If you find this codebase useful for your research, please kindly cite our paper:

@article{xiang2025macro,
  title={Macro-from-Micro Planning for High-Quality and Parallelized Autoregressive Long Video Generation},
  author={Xiang, Xunzhi and Chen, Yabo and Zhang, Guiyu and Wang, Zhongyu and Gao, Zhe and Xiang, Quanming and Shang, Gonghu and Liu, Junqi and Huang, Haibin and Gao, Yang and others},
  journal={arXiv preprint arXiv:2508.03334},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MMPL: Macro-from-Micro Planning for High-Quality and Parallelized Autoregressive Long Video Generation

📌 Release Timeline & TODOs

Requirements

Installation

Quick Start

Download checkpoints

T2V Inference

I2V Inference

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
MMPL_i2v		MMPL_i2v
MMPL_t2v		MMPL_t2v
README.md		README.md
demo.png		demo.png
requirements.txt		requirements.txt

Tele-AI/MMPL

Folders and files

Latest commit

History

Repository files navigation

MMPL: Macro-from-Micro Planning for High-Quality and Parallelized Autoregressive Long Video Generation

📌 Release Timeline & TODOs

Requirements

Installation

Quick Start

Download checkpoints

T2V Inference

I2V Inference

Acknowledgements

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages