Stars
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)
Official Repo for Self-Forcing++ High Quality Long Video Generation
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
Official inference repo for FLUX.1 models
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
[CVPR 2025 (Oral)] Open implementation of "RandAR"
This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA benchmark performance. It also significantly improves the quality, fine-grain…
[ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding
Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.
HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation​
Official implementation of Diffusion Policy Policy Optimization, arxiv 2024
Reference PyTorch implementation and models for DINOv3
[ICCV 2025 Highlight] OminiControl: Minimal and Universal Control for Diffusion Transformer
[CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
16-fold memory access reduction with nearly no loss
Enjoy the magic of Diffusion models!
(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
Wan: Open and Advanced Large-Scale Video Generative Models
[NeurIPS 2025] Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generation
Code for ICML 2025 Paper "Highly Compressed Tokenizer Can Generate Without Training"
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models