Thanks to visit codestin.com
Credit goes to github.com

Skip to content

TheDenk/wan2.1-dilated-controlnet

Repository files navigation

Dilated Controlnet for Wan2.1

demo.mp4

This repo contains the code for dilated controlnet module for Wan2.1 model.
Dilated controlnet has less basic blocks and also has stride parameter.
For Wan1.3B model controlnet blocks count = 8 and stride = 3. For Wan14B model controlnet blocks count = 6 and stride = 4.

TeaCache is available

Use --teacache_treshold parameter for increasing speed of generation (This example for 14B model).

teacache-example.mp4

For ComfyUI

Use the cool ComfyUI-WanVideoWrapper.

Models

Model Processor Huggingface Link
1.3B Canny Link
1.3B HED Link
1.3B Depth Link
14B Canny Link
14B HED Link
14B Depth Link

How to

Clone repo

git clone https://github.com/TheDenk/wan2.1-dilated-controlnet.git
cd wan2.1-dilated-controlnet

Create venv

python -m venv venv
source venv/bin/activate

Install requirements

pip install -r requirements.txt

Inference examples

It is important to use correct prompt and negative prompt.

For detailed information see prompt extention in original repo.

Simple inference with cli

python -m inference.cli_demo \
    --video_path "resources/physical-1.mp4" \
    --prompt "In a cozy kitchen, a golden retriever wearing a white chef's hat and a blue apron stands at the table, holding a sharp kitchen knife and skillfully slicing fresh tomatoes. Its tail sways gently, and its gaze is focused and gentle. There are already several neatly arranged tomatoes on the wooden chopping board in front of me. The kitchen has soft lighting, with various kitchen utensils hanging on the walls and several pots of green plants placed on the windowsill." \
    --controlnet_type "hed" \
    --controlnet_stride 3 \
    --base_model_path Wan-AI/Wan2.1-T2V-1.3B-Diffusers \
    --controlnet_model_path TheDenk/wan2.1-t2v-1.3b-controlnet-hed-v1

Inference with Gradio

python -m inference.gradio_web_demo \
    --controlnet_type "hed" \
    --base_model_path Wan-AI/Wan2.1-T2V-1.3B-Diffusers \
    --controlnet_model_path TheDenk/wan2.1-t2v-1.3b-controlnet-hed-v1

Detailed Inference

python -m inference.cli_demo \
    --video_path "resources/physical-1.mp4" \
    --prompt "In a cozy kitchen, a golden retriever wearing a white chef's hat and a blue apron stands at the table, holding a sharp kitchen knife and skillfully slicing fresh tomatoes. Its tail sways gently, and its gaze is focused and gentle. There are already several neatly arranged tomatoes on the wooden chopping board in front of me. The kitchen has soft lighting, with various kitchen utensils hanging on the walls and several pots of green plants placed on the windowsill." \
    --controlnet_type "hed" \
    --base_model_path Wan-AI/Wan2.1-T2V-1.3B-Diffusers \
    --controlnet_model_path TheDenk/wan2.1-t2v-1.3b-controlnet-hed-v1 \
    --controlnet_weight 0.8 \
    --controlnet_guidance_start 0.0 \
    --controlnet_guidance_end 0.8 \
    --controlnet_stride 3 \
    --num_inference_steps 50 \
    --guidance_scale 5.0 \
    --video_height 480 \
    --video_width 832 \
    --num_frames 81 \
    --negative_prompt "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards" \
    --seed 42 \
    --out_fps 16 \
    --output_path "result.mp4" \
    --teacache_treshold 0.3

Training

Wan 1.3B model requires 18 GB VRAM with batch_size=1. But it also depends on the number of transformer blocks which default is 8 (controlnet_transformer_num_layers parameter in the config).

Dataset

OpenVid-1M dataset was taken as the base variant. CSV files for the dataset you can find here.

Prepare dataset

Download dataset and prepare data. We do not use raw data to save memory.
Extract text embeddings. Initially all text are located in .csv file.

CUDA_VISIBLE_DEVICES=0 python prepare_text_embeddings.py \
--csv_path "path to csv" \
--out_embeds_dir "path to output dir" \
--base_model_path "Wan-AI/Wan2.1-T2V-1.3B-Diffusers" \
--device "cuda" \
--dtype "bf16"

Encode video into vae latents.

CUDA_VISIBLE_DEVICES=0 python prepare_vae_latents.py \
--input_video_dir "path to input video dir" \
--out_latents_dir "dir for output latents" \
--base_model_path "Wan-AI/Wan2.1-T2V-1.3B-Diffusers" \
--sample_stride 2 \
--width 832 \
--height 480 \
--sample_n_frames 81 \
--seed 42 \
--device "cuda" \
--dtype "fp32"

Preprocess original video with controlnet processor.

python prepare_controlnet_video.py \
--input_video_dir "path to input video dir" \
--out_controlnet_video_dir "dir for output controlnet video" \
--controlnet_type "canny" \
--sample_stride 2 \
--width 832 \
--height 480 \
--sample_n_frames 81 

Train script

For start training you need fill the config files accelerate_config_machine_single.yaml and train_controlnet.sh.
In accelerate_config_machine_single.yaml set parameternum_processes: 1 to your GPU count.
In train_controlnet.sh:

  1. Set MODEL_PATH for base Wan2.1 model. Default is Wan-AI/Wan2.1-T2V-1.3B-Diffusers.
  2. Set CUDA_VISIBLE_DEVICES (Default is 0).
  3. Set output_dir, latents_dir, text_embeds_dir and controlnet_video_dir parameters.

Run taining

cd train
bash train_controlnet.sh

Acknowledgements

Original code and models Wan2.1.

Citations

@misc{TheDenk,
    title={Dilated Controlnet},
    author={Karachev Denis},
    url={https://github.com/TheDenk/wan2.1-dilated-controlnet},
    publisher={Github},
    year={2025}
}

Contacts

Issues should be raised directly in the repository. For professional support and recommendations please [email protected].

About

Controlnet module for Wan2.1

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published