Rui Zhao
·
Yuchao Gu
·
Jay Zhangjie Wu
·
David Junhao Zhang
·
Jia-Wei Liu
·
Weijia Wu
·
Jussi Keppo
·
Mike Zheng Shou
Show Lab, National University of Singapore
MotionDirector can customize text-to-video diffusion models to generate videos with desired motions.
Motion Customization of Text-to-Video Diffusion Models:
Given a set of video clips of the same motion concept, the task of Motion Customization is to adapt existing text-to-video diffusion
models to generate diverse videos with this motion.
- [2024.02.03] MotionDirector for AnimateDiff is available. Thanks to ExponentialML.
- [2023.12.27] MotionDirector with Customized Appearance released. Now, you can customize both appearance and motion in video generation.
- [2023.12.27] MotionDirector for Image Animation released.
- [2023.12.23] MotionDirector has been featured in Hugging Face's 'Spaces of the Week 🔥' trending list!
- [2023.12.13] Online gradio demo released @ Hugging Face Spaces! Welcome to try it.
- [2023.12.06] MotionDirector for Sports released! Lifting weights, riding horse, palying golf, etc.
- [2023.12.05] Colab demo is available. Thanks to Camenduru.
- [2023.12.04] MotionDirector for Cinematic Shots released. Now, you can make AI films with professional cinematic shots!
- [2023.12.02] Code and model weights released!
- [2023.10.12] Paper and project page released.
- Gradio Demo
- More trained weights of MotionDirector
| Type | Training Data | Descriptions | Link |
|---|---|---|---|
| MotionDirector for Sports | Multiple videos for each model. | Learn motion concepts of sports, i.e. lifting weights, riding horse, palying golf, etc. | Link |
| MotionDirector for Cinematic Shots | A single video for each model. | Learn motion concepts of cinematic shots, i.e. dolly zoom, zoom in, zoom out, etc. | Link |
| MotionDirector for Image Animation | A single image for spatial path. And a single video or multiple videos for temporal path. | Animate the given image with learned motions. | Link |
| MotionDirector with Customized Appearance | A single image or multiple images for spatial path. And a single video or multiple videos for temporal path. | Customize both appearance and motion in video generation. | Link |
# create virtual environment
conda create -n motiondirector python=3.8
conda activate motiondirector
# install packages
pip install -r requirements.txtgit lfs install
## You can choose the ModelScopeT2V or ZeroScope, etc., as the foundation model.
## ZeroScope
git clone https://huggingface.co/cerspense/zeroscope_v2_576w ./models/zeroscope_v2_576w/
## ModelScopeT2V
git clone https://huggingface.co/damo-vilab/text-to-video-ms-1.7b ./models/model_scope/# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/ruizhaocv/MotionDirector_weights ./outputs
# More and better trained MotionDirector are released at a new repo:
git clone https://huggingface.co/ruizhaocv/MotionDirector ./outputs
# The usage is slightly different, which will be updated later.python MotionDirector_train.py --config ./configs/config_multi_videos.yamlpython MotionDirector_train.py --config ./configs/config_single_video.yamlNote:
- Before running the above command,
make sure you replace the path to foundational model weights and training data with your own in the config files
config_multi_videos.yamlorconfig_single_video.yaml. - Generally, training on multiple 16-frame videos usually takes
300~500steps, about9~16minutes using one A5000 GPU. Training on a single video takes50~150steps, about1.5~4.5minutes using one A5000 GPU. The required VRAM for training is around14GB. - Reduce
n_sample_framesif your GPU memory is limited. - Reduce the learning rate and increase the training steps for better performance.
python MotionDirector_inference.py --model /path/to/the/foundation/model --prompt "Your prompt" --checkpoint_folder /path/to/the/trained/MotionDirector --checkpoint_index 300 --noise_prior 0.Note:
- Replace
/path/to/the/foundation/modelwith your own path to the foundation model, like ZeroScope. - The value of
checkpoint_indexmeans the checkpoint saved at which the training step is selected. - The value of
noise_priorindicates how much the inversion noise of the reference video affects the generation. We recommend setting it to0for MotionDirector trained on multiple videos to achieve the highest diverse generation, while setting it to0.1~0.5for MotionDirector trained on a single video for faster convergence and better alignment with the reference video.
All available weights are at official Huggingface Repo.
Run the download command, the weights will be downloaded to the folder outputs, then run the following inference command to generate videos.
python MotionDirector_inference.py --model /path/to/the/ZeroScope --prompt "A person is riding a bicycle past the Eiffel Tower." --checkpoint_folder ./outputs/train/riding_bicycle/ --checkpoint_index 300 --noise_prior 0. --seed 7192280Note:
- Replace
/path/to/the/ZeroScopewith your own path to the foundation model, i.e. the ZeroScope. - Change the
promptto generate different videos. - The
seedis set to a random value by default. Set it to a specific value will obtain certain results, as provided in the table below.
Results:
16 frames:
python MotionDirector_inference.py --model /path/to/the/ZeroScope --prompt "A tank is running on the moon." --checkpoint_folder ./outputs/train/car_16/ --checkpoint_index 150 --noise_prior 0.5 --seed 855118724 frames:
python MotionDirector_inference.py --model /path/to/the/ZeroScope --prompt "A truck is running past the Arc de Triomphe." --checkpoint_folder ./outputs/train/car_24/ --checkpoint_index 150 --noise_prior 0.5 --width 576 --height 320 --num-frames 24 --seed 34543python MotionDirector_inference.py --model /path/to/the/ZeroScope --prompt "A panda is lifting weights in a garden." --checkpoint_folder ./outputs/train/lifting_weights/ --checkpoint_index 300 --noise_prior 0. --seed 9365597More sports, to be continued ...
python MotionDirector_inference.py --model /path/to/the/ZeroScope --prompt "A firefighter standing in front of a burning forest captured with a dolly zoom." --checkpoint_folder ./outputs/train/dolly_zoom/ --checkpoint_index 150 --noise_prior 0.5 --seed 9365597The reference video is shot with my own water cup. You can also pick up your cup or any other object to practice camera movements and turn it into imaginative videos. Create your AI films with customized camera movements!
python MotionDirector_inference.py --model /path/to/the/ZeroScope --prompt "A firefighter standing in front of a burning forest captured with a zoom in." --checkpoint_folder ./outputs/train/zoom_in/ --checkpoint_index 150 --noise_prior 0.3 --seed 1429227python MotionDirector_inference.py --model /path/to/the/ZeroScope --prompt "A firefighter standing in front of a burning forest captured with a zoom out." --checkpoint_folder ./outputs/train/zoom_out/ --checkpoint_index 150 --noise_prior 0.3 --seed 4971910More Cinematic Shots, to be continued ....
Train the spatial path with reference image.
python MotionDirector_train.py --config ./configs/config_single_image.yamlThen train the temporal path to learn the motion in reference video.
python MotionDirector_train.py --config ./configs/config_single_video.yamlInference with spatial path learned from reference image and temporal path learned form reference video.
python MotionDirector_inference_multi.py --model /path/to/the/foundation/model --prompt "Your prompt" --spatial_path_folder /path/to/the/trained/MotionDirector/spatial/lora/ --temporal_path_folder /path/to/the/trained/MotionDirector/temporal/lora/ --noise_prior 0.Download the pre-trained weights.
git clone https://huggingface.co/ruizhaocv/MotionDirector ./outputsRun the following command.
python MotionDirector_inference_multi.py --model /path/to/the/ZeroScope --prompt "A car is running on the road." --spatial_path_folder ./outputs/train/image_animation/train_2023-12-26T14-37-16/checkpoint-300/spatial/lora/ --temporal_path_folder ./outputs/train/image_animation/train_2023-12-26T13-08-20/checkpoint-300/temporal/lora/ --noise_prior 0.5 --seed 5057764Train the spatial path with reference images.
python MotionDirector_train.py --config ./configs/config_multi_images.yamlThen train the temporal path to learn the motions in reference videos.
python MotionDirector_train.py --config ./configs/config_multi_videos.yamlInference with spatial path learned from reference images and temporal path learned form reference videos.
python MotionDirector_inference_multi.py --model /path/to/the/foundation/model --prompt "Your prompt" --spatial_path_folder /path/to/the/trained/MotionDirector/spatial/lora/ --temporal_path_folder /path/to/the/trained/MotionDirector/temporal/lora/ --noise_prior 0.Download the pre-trained weights.
git clone https://huggingface.co/ruizhaocv/MotionDirector ./outputsRun the following command.
python MotionDirector_inference_multi.py --model /path/to/the/ZeroScope --prompt "A Terracotta Warrior is riding a horse through an ancient battlefield." --spatial_path_folder ./outputs/train/customized_appearance/terracotta_warrior/checkpoint-default/spatial/lora --temporal_path_folder ./outputs/train/riding_horse/checkpoint-default/temporal/lora/ --noise_prior 0. --seed 1455028Results are shown in the table.
If you have a more impressive MotionDirector or generated videos, please feel free to open an issue and share them with us. We would greatly appreciate it. Improvements to the code are also highly welcome.
Please refer to Project Page for more results.
@article{zhao2023motiondirector,
title={MotionDirector: Motion Customization of Text-to-Video Diffusion Models},
author={Zhao, Rui and Gu, Yuchao and Wu, Jay Zhangjie and Zhang, David Junhao and Liu, Jiawei and Wu, Weijia and Keppo, Jussi and Shou, Mike Zheng},
journal={arXiv preprint arXiv:2310.08465},
year={2023}
}
- This code builds on diffusers, Tune-a-video and Text-To-Video-Finetuning. Thanks for open-sourcing!
- Thanks to camenduru for the colab demo.
- Thanks to yhyu13 for the Huggingface Repo.
- We would like to thank AK(@_akhaliq) and huggingface team for the help of setting up oneline gradio demo.
- Thanks to MagicAnimate for the gradio demo template.
- Thanks to deepbeepmeep, and XiaominLi for improving the code.