Stars
Elucidating the Design Space of Diffusion-Based Generative Models (EDM)
[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Rectified Flow Inversion (RF-Inversion) - ICLR 2025
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
PyTorch implementation for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)
G2RPO: Granular GRPO for precise reward in flow models
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
[NeurIPS 2025] Image editing is worth a single LoRA! 0.1% training data for fantastic image editing! Surpasses GPT-4o in ID persistence~ MoE ckpt released! Only 4GB VRAM is enough to run!
✔(已完结)最全面的 深度学习 笔记【土堆 Pytorch】【李沐 动手学深度学习】【吴恩达 深度学习】
A Survey of Reinforcement Learning for Large Reasoning Models
This repository contains implementations and illustrative code to accompany DeepMind publications
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
A general fine-tuning kit geared toward diffusion models.
[NeurIPS 2024] RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance
Official Repository of "OmniTry: Virtual Try-On Anything without Masks"
[CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
[CVPR 2025] DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
[CVPR'25-Demo] Official repository of "TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models".
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
[ICCV 2025] Official repository of DiffSim: Taming Diffusion Models for Evaluating Visual Similarity
An inference and training framework for multiple image input in Flux Kontext dev
[ICCV 2025 ⭐highlight⭐] Implementation of VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory