Dilated Controlnet for Wan2.1

demo.mp4

This repo contains the code for dilated controlnet module for Wan2.1 model.
Dilated controlnet has less basic blocks and also has stride parameter.
For Wan1.3B model controlnet blocks count = 8 and stride = 3. For Wan14B model controlnet blocks count = 6 and stride = 4.

TeaCache is available

Use --teacache_treshold parameter for increasing speed of generation (This example for 14B model).

teacache-example.mp4

For ComfyUI

Use the cool ComfyUI-WanVideoWrapper.

Models

Model	Processor	Huggingface Link
1.3B	Canny	Link
1.3B	HED	Link
1.3B	Depth	Link
14B	Canny	Link
14B	HED	Link
14B	Depth	Link

How to

Clone repo

git clone https://github.com/TheDenk/wan2.1-dilated-controlnet.git
cd wan2.1-dilated-controlnet

Create venv

python -m venv venv
source venv/bin/activate

Install requirements

pip install -r requirements.txt

Inference examples

It is important to use correct prompt and negative prompt.

For detailed information see prompt extention in original repo.

Simple inference with cli

python -m inference.cli_demo \
    --video_path "resources/physical-1.mp4" \
    --prompt "In a cozy kitchen, a golden retriever wearing a white chef's hat and a blue apron stands at the table, holding a sharp kitchen knife and skillfully slicing fresh tomatoes. Its tail sways gently, and its gaze is focused and gentle. There are already several neatly arranged tomatoes on the wooden chopping board in front of me. The kitchen has soft lighting, with various kitchen utensils hanging on the walls and several pots of green plants placed on the windowsill." \
    --controlnet_type "hed" \
    --controlnet_stride 3 \
    --base_model_path Wan-AI/Wan2.1-T2V-1.3B-Diffusers \
    --controlnet_model_path TheDenk/wan2.1-t2v-1.3b-controlnet-hed-v1

Inference with Gradio

python -m inference.gradio_web_demo \
    --controlnet_type "hed" \
    --base_model_path Wan-AI/Wan2.1-T2V-1.3B-Diffusers \
    --controlnet_model_path TheDenk/wan2.1-t2v-1.3b-controlnet-hed-v1

Detailed Inference

python -m inference.cli_demo \
    --video_path "resources/physical-1.mp4" \
    --prompt "In a cozy kitchen, a golden retriever wearing a white chef's hat and a blue apron stands at the table, holding a sharp kitchen knife and skillfully slicing fresh tomatoes. Its tail sways gently, and its gaze is focused and gentle. There are already several neatly arranged tomatoes on the wooden chopping board in front of me. The kitchen has soft lighting, with various kitchen utensils hanging on the walls and several pots of green plants placed on the windowsill." \
    --controlnet_type "hed" \
    --base_model_path Wan-AI/Wan2.1-T2V-1.3B-Diffusers \
    --controlnet_model_path TheDenk/wan2.1-t2v-1.3b-controlnet-hed-v1 \
    --controlnet_weight 0.8 \
    --controlnet_guidance_start 0.0 \
    --controlnet_guidance_end 0.8 \
    --controlnet_stride 3 \
    --num_inference_steps 50 \
    --guidance_scale 5.0 \
    --video_height 480 \
    --video_width 832 \
    --num_frames 81 \
    --negative_prompt "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards" \
    --seed 42 \
    --out_fps 16 \
    --output_path "result.mp4" \
    --teacache_treshold 0.3

Training

Wan 1.3B model requires 18 GB VRAM with batch_size=1. But it also depends on the number of transformer blocks which default is 8 (controlnet_transformer_num_layers parameter in the config).

Dataset

OpenVid-1M dataset was taken as the base variant. CSV files for the dataset you can find here.

Prepare dataset

Download dataset and prepare data. We do not use raw data to save memory.
Extract text embeddings. Initially all text are located in .csv file.

CUDA_VISIBLE_DEVICES=0 python prepare_text_embeddings.py \
--csv_path "path to csv" \
--out_embeds_dir "path to output dir" \
--base_model_path "Wan-AI/Wan2.1-T2V-1.3B-Diffusers" \
--device "cuda" \
--dtype "bf16"

Encode video into vae latents.

CUDA_VISIBLE_DEVICES=0 python prepare_vae_latents.py \
--input_video_dir "path to input video dir" \
--out_latents_dir "dir for output latents" \
--base_model_path "Wan-AI/Wan2.1-T2V-1.3B-Diffusers" \
--sample_stride 2 \
--width 832 \
--height 480 \
--sample_n_frames 81 \
--seed 42 \
--device "cuda" \
--dtype "fp32"

Preprocess original video with controlnet processor.

python prepare_controlnet_video.py \
--input_video_dir "path to input video dir" \
--out_controlnet_video_dir "dir for output controlnet video" \
--controlnet_type "canny" \
--sample_stride 2 \
--width 832 \
--height 480 \
--sample_n_frames 81

Train script

For start training you need fill the config files accelerate_config_machine_single.yaml and train_controlnet.sh.
In accelerate_config_machine_single.yaml set parameternum_processes: 1 to your GPU count.
In train_controlnet.sh:

Set MODEL_PATH for base Wan2.1 model. Default is Wan-AI/Wan2.1-T2V-1.3B-Diffusers.
Set CUDA_VISIBLE_DEVICES (Default is 0).
Set output_dir, latents_dir, text_embeds_dir and controlnet_video_dir parameters.

Run taining

cd train
bash train_controlnet.sh

Acknowledgements

Original code and models Wan2.1.

Citations

@misc{TheDenk,
    title={Dilated Controlnet},
    author={Karachev Denis},
    url={https://github.com/TheDenk/wan2.1-dilated-controlnet},
    publisher={Github},
    year={2025}
}

Contacts

Issues should be raised directly in the repository. For professional support and recommendations please [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
inference		inference
resources		resources
train		train
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
wan_controlnet.py		wan_controlnet.py
wan_controlnet_pipeline.py		wan_controlnet_pipeline.py
wan_teacache.py		wan_teacache.py
wan_transformer.py		wan_transformer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Dilated Controlnet for Wan2.1

TeaCache is available

For ComfyUI

Models

How to

Inference examples

It is important to use correct prompt and negative prompt.

For detailed information see prompt extention in original repo.

Simple inference with cli

Inference with Gradio

Detailed Inference

Training

Dataset

Prepare dataset

Train script

Acknowledgements

Citations

Contacts

About

Uh oh!

Releases

Packages

Languages

Uh oh!

License

Uh oh!

TheDenk/wan2.1-dilated-controlnet

Folders and files

Latest commit

History

Repository files navigation

Dilated Controlnet for Wan2.1

TeaCache is available

For ComfyUI

Models

How to

Inference examples

It is important to use correct prompt and negative prompt.

For detailed information see prompt extention in original repo.

Simple inference with cli

Inference with Gradio

Detailed Inference

Training

Dataset

Prepare dataset

Train script

Acknowledgements

Citations

Contacts

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages