Releases: vipshop/cache-dit
Releases · vipshop/cache-dit
v1.1.10
New Models Supported
LongCat-Image, LongCat-Image-Edit, Z-Image-Turbo-ControlNet, Z-Image-Turbo Nunchaku, Qwen-Image-Edit-2511, Qwen-Image-Layered
What's Changed
- Simplify CLI: Make task argument optional by @BBuf in #600
- chore: fix extra path compare by @DefTruth in #603
- CI: Add build_wheel CI by @DefTruth in #604
- feat: support cache for LongCat-Image by @e1ijah1 in #602
- feat: Serving support LORA by @BBuf in #601
- CI: Add Forward Pattern CPU CI Tests by @DefTruth in #605
- chore: Update README.md by @DefTruth in #606
- Fix typo by @BBuf in #607
- feat: z-image-controlnet 🔥4x speedup! by @DefTruth in #608
- fix lora path mismatch in examples by @DefTruth in #609
- feat: support TP and CP for longcat-image by @DefTruth in #610
- misc: fix typo by @DefTruth in #612
- feat: support 🔥Qwen-Image-Edit-2511 by @DefTruth in #614
- feat: support 🔥Qwen-Image-Layered by @DefTruth in #615
- chore: simplify parallelism dispatch by @DefTruth in #616
- chore: simplify quantize dispatch by @DefTruth in #617
- chore: refactor kernels module by @DefTruth in #618
- ci: add refresh context ci tests by @DefTruth in #619
- misc: add device info to example summary by @DefTruth in #621
- feat: support ⚡️Z-Image-Turbo Nunchaku by @DefTruth in #623
- [chore] Improve error_logging in serving tp_worker.py by @BBuf in #627
- chore: support more alias for quant types by @DefTruth in #628
- chore: fix alias rev map for quant types by @DefTruth in #629
- chore: lazy import check for quantize api by @DefTruth in #630
- chore: add more compile flags setting by @DefTruth in #631
- [Bug] Apply --attn backend in single-GPU examples by @BBuf in #633
- Bump up to v1.1.10 by @DefTruth in #634
New Contributors
Full Changelog: v1.1.9...v1.1.10
v1.1.9
What's Changed
- feat: uaa avoid extra memory IO access by @triple-mu in #551
- chore: simplify quantize flags in example utils by @DefTruth in #553
- chore: fix quantize flags in example by @DefTruth in #554
- chore: fix quantize & TP conflicts for wan by @DefTruth in #556
- feat: support serving text2video by @BBuf in #555
- chore: Update SERVING Doc and FAQ Doc by @BBuf in #557
- chore: qwen edit lightning cp/tp examples by @DefTruth in #559
- feat: support ovis-image context parallel by @DefTruth in #560
- feat: serving support image2video by @BBuf in #558
- chore: add collect_env script by @DefTruth in #562
- Add pre-commit and GitHub Actions CI by @DefTruth in #564
- chore: refactor parallelism for better reusability by @DefTruth in #565
- chore: Update vLLM-Omni integration by @SamitHuang in #566
- feat: add pipe quant config for serving by @nono-Sang in #563
- News: 🔥vLLM-Omni x Cache-DiT ready! by @DefTruth in #567
- feat: enable custom attn backend for TP by @DefTruth in #568
- feat: support TP for many text encoder by @DefTruth in #569
- fix qwen-edit-lightning examples by @DefTruth in #571
- fix get_text_encoder_from_pipe by @DefTruth in #572
- fix: handle general compile options in example utils by @DefTruth in #573
- chore: reduce un-popular examples by @DefTruth in #574
- feat: add text_encoder tp for serving by @nono-Sang in #570
- chore: simplify example by @DefTruth in #575
- chore: make unified examples by @DefTruth in #576
- chore: fix vllm-omni docs link by @DefTruth in #577
- chore: optimize examples default path mapping by @DefTruth in #579
- chore: fix vllm-omni docs link by @DefTruth in #580
- feat: support Ovis-Image tensor parallel by @DefTruth in #582
- chore: fix typo in User_Guide.md by @DefTruth in #583
- chore: fail fast TP validation for attn heads by @CPFLAME in #581
- fix patch functor for multi transformers by @DefTruth in #586
- chore: add qwen image controlnet example by @DefTruth in #588
- chore: update docs by @DefTruth in #590
- feat: register fa3 backend for context parallel by @nono-Sang in #589
- chore: support separate quant-type for text encoder by @DefTruth in #591
- hotfix for fa3 backend import error by @DefTruth in #593
- chore: fix typo in README.md by @DefTruth in #594
- chore: set save_ctx to False for inference by @nono-Sang in #596
- fix flux examples model path mismatch by @DefTruth in #597
New Contributors
- @SamitHuang made their first contribution in #566
- @nono-Sang made their first contribution in #563
- @CPFLAME made their first contribution in #581
Full Changelog: v1.1.8...v1.1.9
v1.1.8
What's Changed
- Add request queue to limit concurrent generation requests by @BBuf in #535
- News: 🔥🔥SGLang Diffusion x Cache-DiT ready!🔥🔥 by @DefTruth in #536
- feat: optimize async fp8 ulysses attn by @DefTruth in #537
- chore: delete useless code in serving by @BBuf in #539
- feat: make all_to_all comm unified by @DefTruth in #538
- feat: add refresh cache context api by @DefTruth in #542
- feat: support image edit model in serving by @BBuf in #541
- chore: simplify ulysses async flag by @DefTruth in #543
- chore: re-registered sage attn backend by @DefTruth in #544
- feat: support any head num for ulysses by @DefTruth in #546
- feat: support uneven heads in ulysses w/o padding by @triple-mu in #547
- chore: refactor FLUX.2 image editing tests by @BBuf in #548
- feat: Add ulysses for any heads w/o padding by @DefTruth in #549
- feat: add envs manager for cache-dit by @DefTruth in #550
- fix ulysses fp8 and uneven head conflicts by @DefTruth in #552
Full Changelog: v1.1.7...v1.1.8
v1.1.7
hotfix for diffusers 0.35.2 compatible
What's Changed
- feat: pointer casting for fp8 all2all by @triple-mu in #533
- chore: relax block adapter deps by @DefTruth in #534
Full Changelog: v1.1.6...v1.1.7
v1.1.6
v1.1.5 🔥HunyuanVideo-1.5/Ovis-Image
What's Changed
- Add profiler for flux tp and cp example by @BBuf in #501
- chore: Update README.md by @DefTruth in #502
- feat: support FnB0 for z-image w/ cp by @DefTruth in #503
- feat: support _sdpa_cudnn backend for cp by @DefTruth in #504
- feat: support async ulysses cp for z-image by @DefTruth in #505
- feat: add all_to_all_single v2 by @DefTruth in #507
- feat: support async ulysses cp for qwen-image by @DefTruth in #508
- feat: support all2all qkv per token fp8 by @triple-mu in #509
- chore: improve flux2 and qwen image examples by @BBuf in #512
- fix: workaround for uaa-fp8 .view compile error by @triple-mu in #514
- feat: relaxed transformer strict assert by @DefTruth in #515
- feat: all2all qkv fp8 for ulysses by @DefTruth in #516
- feat: support pre-defined step masks by @DefTruth in #517
- chore: separate chrono-edit and wan cp plan by @DefTruth in #519
- fix example utils.py uaa fp8 flag typo by @DefTruth in #521
- feat: extend predefined step masks for 4/6 steps by @DefTruth in #523
- misc: add z-image-turbo predefined step masks by @DefTruth in #525
- feat: support per_token_quant_fp8 triton kernel by @triple-mu in #524
- feat: unified async ulysses fp8 by @DefTruth in #526
- feat: support serving for cache-dit by @BBuf in #522
- Fix get_model_info api 404 when serving with tp/cp by @BBuf in #529
- feat: support cache for hunyuanvideo-1.5 by @DefTruth in #528
- feat: support cache for ovis-image by @DefTruth in #530
New Contributors
- @triple-mu made their first contribution in #509
Full Changelog: v1.1.4...v1.1.5
v1.1.4 🔥FLUX.2/Z-Image
What's Changed
- feat: support torch profiler in cache-dit by @BBuf in #491
- feat: support 🔥z-image tensor parallel by @gameofdimension in #494
- feat: support lumina2 tensor parallel by @gameofdimension in #495
- feat: support cache for 🔥z-image by @DefTruth in #496
- feat: support context parallel for 🔥z-image by @DefTruth in #497
- fix: temp FnB(n>0) workaround for z-image cache w/ cp by @DefTruth in #499
Full Changelog: v1.1.3...v1.1.4
v1.1.3 🔥FLUX.2
What's Changed
- chore: Add wan 2.2 i2v context parallel example by @DefTruth in #476
- chore: optimize wan examples, compile & offload by @BBuf in #477
- feat: support async ulysses cp for flux by @DefTruth in #480
- chore: update support matrix by @DefTruth in #484
- chore: update async ulysses cp docs by @DefTruth in #486
- chore: update async ulysses cp refs by @DefTruth in #487
- feat: support FLUX.2-dev Tensor Parallelism by @gameofdimension in #485
- feat: support Hybrid cache + TP for 🔥FLUX.2 by @DefTruth in #489
- feat: Add seq offload for 🔥FLUX.2 w/o parallel by @DefTruth in #490
- feat: support 🔥FLUX.2 context parallel by @DefTruth in #492
Full Changelog: v1.1.2...v1.1.3
v1.1.2 UAA & SkyReelsV2 TP/CP
What's Changed
- chore: Update README.md by @DefTruth in #455
- fix load options drop kwargs by @DefTruth in #456
- chore: add maybe pad prompt utils by @DefTruth in #458
- fix: move .to(device) to reduce tp mem by @BBuf in #459
- example: support more overrided args and memory tracker by @BBuf in #461
- Add missing model-path args in example by @BBuf in #463
- UAA: ulysses anything attn w/ zero overhead by @DefTruth in #462
- fix qwen-image multi-gpu mismatch by @BBuf in #464
- Fix more models multi gpu mismatch by @BBuf in #466
- feat: support unshard anything for UAA by @DefTruth in #465
- chore: update qwen-image example for UAA by @DefTruth in #468
- chore: Update README.md by @DefTruth in #470
- chore: Update README.md by @DefTruth in #471
- support skyreels cp and tp ulysses by @BBuf in #469
- always use vae tiling if vram <= 48 GiB for qwen-image by @DefTruth in #472
- chore: Add SkyReelsV2 tp/cp to support-matrix by @BBuf in #473
- fix: correct string literal syntax errors in examples by @BBuf in #475
- feat: allow UAA in compiled graph by @DefTruth in #474
New Contributors
Full Changelog: v1.1.1...v1.1.2
v1.1.1
What's Changed
- chore: Update README.md by @DefTruth in #442
- feat: support step compute mask by @DefTruth in #444
- bugfix: fix bench distill cfg mismatch by @DefTruth in #445
- chore: update step mask docs by @DefTruth in #446
- chore: Update User_Guide.md by @DefTruth in #447
- chore: update README by @DefTruth in #448
- chore: update step mask example by @DefTruth in #449
- chore: hightlight
SCM- step computation mask by @DefTruth in #450 - chore: hightlight
SCM- step computation mask by @DefTruth in #451 - chore: hightlight SCM - step computation mask by @DefTruth in #452
- misc: support quantize and attn backend for flux example by @DefTruth in #453
- misc: add quant and attn backend -> step mask example by @DefTruth in #454
Full Changelog: v1.1.0...v1.1.1