CamTrol: Training-free Camera Control for Video Generation

‼️CogVideoX version/Any Video Model version CamTrol‼️

We now have CamTrol code implemented on diffusers-based video models. It makes it faster to revise the code for the more powerful video models in diffusers.

Some results of CogVideoX+CamTrol can be found on the CamTrol page.

The code : https://github.com/LAARRRY/CamTrol-CogVideoX-Diffusers.

This repository is unofficial implementation of CamTrol: Training-free Camera Control for Video Generation, based on SVD.

Some videos generated through SVD:

Setup

pip install -r requirement.txt
Download SVD checkpoint svd.safetensors and set its path at ckpt_path in sgm/svd.yaml.
Clone depth estimation model: git clone https://github.com/isl-org/ZoeDepth.git

The code downloads stable-diffusion-inpainting and open-clip automatically, you can set to your path if they're already done.

Sampling

CUDA_VISIBLE_DEVICES=0 python3 sampling.py \
                    --input_path "assets/images/street.jpg" \
                    --prompt "a vivid anime street, wind blows." \
                    --neg_prompt " " \
                    --pcd_mode "hybrid default 14 out_left_up_down" \
                    --add_index 12 \
                    --seed 1 \
                    --save_warps False \
                    --load_warps None

pcd_mode: camera motion for point cloud rendering, a string concat by four elements. For each element, the first defines camera motion, the second defines moving distance or angle, the third defines number of frames, the last defines moving direction. You can load any camera extrinsics matrices in complex mode, and set bigger add_index for better motion alignment.
prompt, neg_prompt: as SVD doesn't support text input, these mainly serve for stable diffusion inpainting.
add_index: t_0 in the paper, balancing trade-off between motion fidelity and video diversity. Set between 0 and num_frames, the bigger the more faithful video aligns to camera motion.
save_warps: whether save multi-view renderings, you can reload the already-rendered images as this process might takes some time. Use low-res images to boost speed.
load_warps: whether load renderings from save_warps or not.

Backbones

I used SVD in this repository. You can use it on your customized video diffusion model.

Acknowledgement

The code is majorly founded on SVD and LucidDreamer.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
sgm		sgm
util		util
README.md		README.md
camera.py		camera.py
requirement.txt		requirement.txt
sampling.py		sampling.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

CamTrol: Training-free Camera Control for Video Generation

‼️CogVideoX version/Any Video Model version CamTrol‼️

Setup

Sampling

Backbones

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Uh oh!

Uh oh!

LAARRRY/CamTrol

Folders and files

Latest commit

History

Repository files navigation

CamTrol: Training-free Camera Control for Video Generation

‼️CogVideoX version/Any Video Model version CamTrol‼️

Setup

Sampling

Backbones

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages