LongVie 2: Multimodal Controllable Ultra-Long Video World Model

LongVie 2 is a multimodal controllable world model for generating ultra-long videos with depth and pointmap control signals.

Authors: Jianxiong Gao, Zhaoxi Chen, Xian Liu, Junhao Zhuang, Chengming Xu, Jianfeng Feng, Yu Qiao, Yanwei Fu†, Chenyang Si†, Ziwei Liu†

🚀 Quick Start

Installation

conda create -n longvie python=3.10 -y
conda activate longvie
conda install psutil
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
python -m pip install ninja
python -m pip install git+https://github.com/Dao-AILab/[email protected]
cd LongVie
pip install -e .

Download Weights

Download the base model Wan2.1-I2V-14B-480P:

python download_wan2.1.py

Download the LongVie2 weights and place them in ./model/LongVie/

Inference

Generate a 5s video clip (~8-9 mins on a single A100 GPU):

bash sample_longvideo.sh

Training

bash train.sh

🎛️ Control Signal Extraction

We provide utilities for extracting control signals in ./utils:

# Extract depth maps
bash get_depth.sh

# Convert depth to .mp4 format
python depth_npy2mp4.py

# Extract trajectory
bash get_track.py

To refine prompts after editing the first frame:

python qwen_caption_refine.py

📄 Citation

If you find this work useful, please consider citing:

@misc{gao2025longvie,
  title={LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation}, 
  author={Jianxiong Gao and Zhaoxi Chen and Xian Liu and Jianfeng Feng and Chenyang Si and Yanwei Fu and Yu Qiao and Ziwei Liu},
  year={2025},
  eprint={2508.03694},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2508.03694}, 
}

@misc{gao2025longvie2,
  title={LongVie 2: Multimodal Controllable Ultra-Long Video World Model}, 
  author={Jianxiong Gao and Zhaoxi Chen and Xian Liu and Junhao Zhuang and Chengming Xu and Jianfeng Feng and Yu Qiao and Yanwei Fu and Chenyang Si and Ziwei Liu},
  year={2025},
  eprint={2512.13604},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2512.13604}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
diffsynth		diffsynth
example		example
utils		utils
README.md		README.md
accelerate_config_14B.yaml		accelerate_config_14B.yaml
download_wan2.1.py		download_wan2.1.py
inference.py		inference.py
requirements.txt		requirements.txt
sample_longvideo.sh		sample_longvideo.sh
setup.py		setup.py
train.sh		train.sh
train_longvie_control.py		train_longvie_control.py
train_longvie_history_control.py		train_longvie_history_control.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LongVie 2: Multimodal Controllable Ultra-Long Video World Model

🚀 Quick Start

Installation

Download Weights

Inference

Training

🎛️ Control Signal Extraction

📄 Citation

About

Uh oh!

Releases

Packages

Languages

Vchitect/LongVie

Folders and files

Latest commit

History

Repository files navigation

LongVie 2: Multimodal Controllable Ultra-Long Video World Model

🚀 Quick Start

Installation

Download Weights

Inference

Training

🎛️ Control Signal Extraction

📄 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages