Thanks to visit codestin.com
Credit goes to github.com

Skip to content

v1.1.0 πŸŽ‰Context/Tensor Parallelism

Choose a tag to compare

@DefTruth DefTruth released this 18 Nov 03:47
· 148 commits to main since this release
97366d6

πŸ”₯Hightlight

We are excited to announce that the πŸŽ‰v1.1.0 version of cache-dit has finally been released! It brings πŸ”₯Context Parallelism and πŸ”₯Tensor Parallelism to cache-dit, thus making it a Unified and Flexible Inference Engine for πŸ€—DiTs. Key features: Unified Cache APIs, Forward Pattern Matching, Block Adapter, DBCache, DBPrune, Cache CFG, TaylorSeer, Context Parallelism, Tensor Parallelism and πŸŽ‰SOTA performance.

βš™οΈInstallation

You can install the stable release of cache-dit from PyPI:

pip3 install -U cache-dit # or, pip3 install -U "cache-dit[all]" for all features

Or you can install the latest develop version from GitHub:

pip3 install git+https://github.com/vipshop/cache-dit.git

Please also install the latest main branch of diffusers for context parallelism:

pip3 install git+https://github.com/huggingface/diffusers.git

πŸ”₯Supported DiTs

Tip

One Model Series may contain many pipelines. cache-dit applies optimizations at the Transformer level; thus, any pipelines that include the supported transformer are already supported by cache-dit. βœ…: known work and official supported now; βœ–οΈ: unofficial supported now, but maybe support in the future; Q: 4-bits models w/ nunchaku + SVDQ W4A4.

πŸ“šModel Cache CP TP πŸ“šModel Cache CP TP
πŸŽ‰FLUX.1 βœ… βœ… βœ… πŸŽ‰FLUX.1 Q βœ… βœ… βœ–οΈ
πŸŽ‰FLUX.1-Fill βœ… βœ… βœ… πŸŽ‰FLUX.1-Fill Q βœ… βœ… βœ–οΈ
πŸŽ‰Qwen-Image βœ… βœ… βœ… πŸŽ‰Qwen-Image Q βœ… βœ… βœ–οΈ
πŸŽ‰Qwen...Edit βœ… βœ… βœ… πŸŽ‰Qwen...Edit Q βœ… βœ… βœ–οΈ
πŸŽ‰Qwen...Lightning βœ… βœ… βœ… πŸŽ‰Qwen...Light Q βœ… βœ… βœ–οΈ
πŸŽ‰Qwen...Control.. βœ… βœ… βœ… πŸŽ‰Qwen...E...Light Q βœ… βœ… βœ–οΈ
πŸŽ‰Wan 2.1 I2V/T2V βœ… βœ… βœ… πŸŽ‰Mochi βœ… βœ–οΈ βœ…
πŸŽ‰Wan 2.1 VACE βœ… βœ… βœ… πŸŽ‰HiDream βœ… βœ–οΈ βœ–οΈ
πŸŽ‰Wan 2.2 I2V/T2V βœ… βœ… βœ… πŸŽ‰HunyunDiT βœ… βœ–οΈ βœ…
πŸŽ‰HunyuanVideo βœ… βœ… βœ… πŸŽ‰Sana βœ… βœ–οΈ βœ–οΈ
πŸŽ‰ChronoEdit βœ… βœ… βœ… πŸŽ‰Bria βœ… βœ–οΈ βœ–οΈ
πŸŽ‰CogVideoX βœ… βœ… βœ… πŸŽ‰SkyReelsV2 βœ… βœ–οΈ βœ–οΈ
πŸŽ‰CogVideoX 1.5 βœ… βœ… βœ… πŸŽ‰Lumina 1/2 βœ… βœ–οΈ βœ–οΈ
πŸŽ‰CogView4 βœ… βœ… βœ… πŸŽ‰DiT-XL βœ… βœ… βœ–οΈ
πŸŽ‰CogView3Plus βœ… βœ… βœ… πŸŽ‰Allegro βœ… βœ–οΈ βœ–οΈ
πŸŽ‰PixArt Sigma βœ… βœ… βœ… πŸŽ‰Cosmos βœ… βœ–οΈ βœ–οΈ
πŸŽ‰PixArt Alpha βœ… βœ… βœ… πŸŽ‰OmniGen βœ… βœ–οΈ βœ–οΈ
πŸŽ‰Chroma-HD βœ… βœ… οΈβœ… πŸŽ‰EasyAnimate βœ… βœ–οΈ βœ–οΈ
πŸŽ‰VisualCloze βœ… βœ… βœ… πŸŽ‰StableDiffusion3 βœ… βœ–οΈ βœ–οΈ
πŸŽ‰HunyuanImage βœ… βœ… βœ… πŸŽ‰PRX T2I βœ… βœ–οΈ βœ–οΈ
πŸŽ‰Kandinsky5 βœ… βœ…οΈ βœ…οΈ πŸŽ‰Amused βœ… βœ–οΈ βœ–οΈ
πŸŽ‰LTXVideo βœ… βœ… βœ… πŸŽ‰AuraFlow βœ… βœ–οΈ βœ–οΈ
πŸŽ‰ConsisID βœ… βœ… βœ… πŸŽ‰LongCatVideo βœ… βœ–οΈ βœ–οΈ

⚑️Hybrid Context Parallelism

cache-dit is compatible with context parallelism. Currently, we support the use of Hybrid Cache + Context Parallelism scheme (via NATIVE_DIFFUSER parallelism backend) in cache-dit. Users can use Context Parallelism to further accelerate the speed of inference! For more details, please refer to πŸ“šexamples/parallelism. Currently, cache-dit supported context parallelism for FLUX.1, Qwen-Image, Qwen-Image-Lightning, LTXVideo, Wan 2.1, Wan 2.2, HunyuanImage-2.1, HunyuanVideo, CogVideoX 1.0, CogVideoX 1.5, CogView 3/4 and VisualCloze, etc. cache-dit will support more models in the future.

# pip3 install "cache-dit[parallelism]"
from cache_dit import ParallelismConfig

cache_dit.enable_cache(
    pipe_or_adapter, 
    cache_config=DBCacheConfig(...),
    # Set ulysses_size > 1 to enable ulysses style context parallelism.
    parallelism_config=ParallelismConfig(ulysses_size=2),
)
# torchrun --nproc_per_node=2 parallel_cache.py

⚑️Hybrid Tensor Parallelism

cache-dit is also compatible with tensor parallelism. Currently, we support the use of Hybrid Cache + Tensor Parallelism scheme (via NATIVE_PYTORCH parallelism backend) in cache-dit. Users can use Tensor Parallelism to further accelerate the speed of inference and reduce the VRAM usage per GPU! For more details, please refer to πŸ“šexamples/parallelism. Now, cache-dit supported tensor parallelism for FLUX.1, Qwen-Image, Qwen-Image-Lightning, Wan2.1, Wan2.2, HunyuanImage-2.1, HunyuanVideo and VisualCloze, etc. cache-dit will support more models in the future.

# pip3 install "cache-dit[parallelism]"
from cache_dit import ParallelismConfig

cache_dit.enable_cache(
    pipe_or_adapter, 
    cache_config=DBCacheConfig(...),
    # Set tp_size > 1 to enable tensor parallelism.
    parallelism_config=ParallelismConfig(tp_size=2),
)
# torchrun --nproc_per_node=2 parallel_cache.py

Important

Please note that in the short term, we have no plans to support Hybrid Parallelism. Please choose to use either Context Parallelism or Tensor Parallelism based on your actual scenario.