Stars
[NeurIPS 2025] Improving Video Generation with Human Feedback
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model
[ICCV'25 Best Paper Finalist] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
CogView4, CogView3-Plus and CogView3(ECCV 2024)
[SIGGRAPH Asia 2024, Best Paper Honorable Mention] This is the official implementation of our SIGGRAPH Asia journal artical: TEXGen: a Generative Diffusion Model for Mesh Textures
A light-weight and high-efficient training framework for accelerating diffusion tasks.
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
In 2024, the strongest open-source implementation of asymmetric magvit_v2 supports inference code but excludes VQVAE. It supports the joint encoding of images and videos, accommodating arbitrary vi…
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Official codes of VEnhancer: Generative Space-Time Enhancement for Video Generation
Lumina-T2X is a unified framework for Text to Any Modality Generation
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
An Open-source Toolkit for LLM Development
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Open-Sora: Democratizing Efficient Video Production for All
[TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.
We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters
[ICCV 2023] MatrixCity: A Large-scale City Dataset for City-scale Neural Rendering and Beyond.
[CVPR 2024] CoSeR: Bridging Image and Language for Cognitive Super-Resolution
[ECCV 2024] FreeInit: Bridging Initialization Gap in Video Diffusion Models
[CVPR 2024] Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
[ICLR 2024 Spotlight] Official implementation of ScaleCrafter for higher-resolution visual generation at inference time.
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
Implementation of MagViT2 Tokenizer in Pytorch
Implementation of Muse: Text-to-Image Generation via Masked Generative Transformers, in Pytorch