Self-Forcing-Plus focuses on step distillation and CFG distillation for bidirectional models. Building upon Self-Forcing, we support 4-step T2V-14B model training and higher quality 4-step I2V-14B model training.
- (2025/09) Support Wan2.2-Moe distillation! wan22
| Model Type | Model Link |
|---|---|
| T2V-14B | Huggingface |
| I2V-14B-480P | Huggingface |
Create a conda environment and install dependencies:
conda create -p /home/ec2-user/SageMaker/efs/conda_envs/self_forcing python=3.10 -y
conda activate self_forcing
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
python setup.py develop
huggingface-cli download Wan-AI/Wan2.1-T2V-14B --local-dir /home/ec2-user/SageMaker/efs/Models/wan_models/Wan2.1-T2V-14B
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-480P --local-dir /home/ec2-user/SageMaker/efs/Models/wan_models/Wan2.1-I2V-14B-480P
DMD training for bidirectional models do not need ODE initialization.
We build the dataset in the following way, each file contains a single prompt:
data_folder
|__1.txt
|__2.txt
...
|__xxx.txt
Our training run uses 3000 iterations and completes in under 3 days using 64 H100 GPUs.
nohup bash scripts/train_qwen_dmd.sh > logs/dmd.out 2>&1 &# Evaluate DMD trained LoRA models
CUDA_VISIBLE_DEVICES=5 python benchmark.py \
--model_path /home/ec2-user/SageMaker/efs/Projects/Qwen-Image-Edit-Acceleration/checkpoints/Qwen-Image-Edit-2509-step4000 \
--test_path /home/ec2-user/SageMaker/efs/Projects/Qwen-Image-Edit-Acceleration/data/test_data.csv \
--data_dir /home/ec2-user/SageMaker/efs/Projects/Qwen-Image-Edit-Acceleration/data \
--output_dir /home/ec2-user/SageMaker/efs/Projects/Qwen-Image-Edit-Acceleration/outputs/H100_ckpt4k_dmd_istep40_cfg4 \
--cfg_scale 4 \
--num_inference_steps 40 \
--prompt "让图2的模特换上图1的下装" \
--max_samples 50 \
--seed 0 \
> /home/ec2-user/SageMaker/efs/Projects/Qwen-Image-Edit-Acceleration/logs/H100_ckpt4k_dmd_istep40_cfg4.out 2>&1 &
CUDA_VISIBLE_DEVICES=6 python benchmark.py \
--model_path /home/ec2-user/SageMaker/efs/Projects/Qwen-Image-Edit-Acceleration/checkpoints/Qwen-Image-Edit-2509-step4000 \
--test_path /home/ec2-user/SageMaker/efs/Projects/Qwen-Image-Edit-Acceleration/data/test_data.csv \
--data_dir /home/ec2-user/SageMaker/efs/Projects/Qwen-Image-Edit-Acceleration/data \
--output_dir /home/ec2-user/SageMaker/efs/Projects/Qwen-Image-Edit-Acceleration/outputs/H100_ckpt4k_dmd_istep8_cfg1 \
--cfg_scale 1 \
--num_inference_steps 8 \
--prompt "让图2的模特换上图1的下装" \
--max_samples 50 \
--seed 0 \
> /home/ec2-user/SageMaker/efs/Projects/Qwen-Image-Edit-Acceleration/logs/H100_ckpt4k_dmd_istep8_cfg1.out 2>&1 &
CUDA_VISIBLE_DEVICES=7 python benchmark.py \
--model_path /home/ec2-user/SageMaker/efs/Projects/Qwen-Image-Edit-Acceleration/checkpoints/Qwen-Image-Edit-2509-step4000 \
--lora_path /home/ec2-user/SageMaker/efs/Projects/Self-Forcing-Plus/checkpoints/checkpoint_000200/generator.safetensors \
--test_path /home/ec2-user/SageMaker/efs/Projects/Qwen-Image-Edit-Acceleration/data/test_data.csv \
--data_dir /home/ec2-user/SageMaker/efs/Projects/Qwen-Image-Edit-Acceleration/data \
--output_dir /home/ec2-user/SageMaker/efs/Projects/Qwen-Image-Edit-Acceleration/outputs/H100_ckpt4k_dmd_sam3k_tstep200_istep8 \
--cfg_scale 1 \
--num_inference_steps 8 \
--prompt "让图2的模特换上图1的下装" \
--max_samples 50 \
--seed 0 \
--debug \
> /home/ec2-user/SageMaker/efs/Projects/Qwen-Image-Edit-Acceleration/logs/H100_ckpt4k_dmd_sam3k_tstep200_istep8.out 2>&1 &
Our training run uses 1000 iterations and completes in under 12 hours using 64 H100 GPUs.
This codebase is built on top of the open-source implementation of CausVid, Self-Forcing and the Wan2.1 repo.