Stars
(ICCV 2025) "Principal Components" Enable A New Language of Images
Edit-R1: Reinforce Image Editing with Diffusion Negative-Aware Finetuning and MLLM Implicit Feedback
Official Implementation of "UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation"
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Official repository for the UAE paper, unified-GRPO, and unified-Bench
embracefailure / ATU-Agent
Forked from assafelovic/gpt-researcherCustomer profiling agent that conducts deep local and web research and generates a long report with citations. Completed during summer internship at Microsoft.
WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction
Awesome Unified Multimodal Models
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
Official Implementation of ConsisLoRA
Code for NeurIPS 2024 work "MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps"
Official PyTorch Implementation of “VLScene: Vision-Language Guidance Distillation for Camera-based 3D Semantic Scene Completion”(AAAI 2025 Oral)
[SIGGRAPH 2025] Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control
Benchmark dataset and code of MSRVTT-Personalization
[ICCV2025] The code of our work "Golden Noise for Diffusion Models: A Learning Framework".
SSSegmentation: An Open Source Supervised Semantic Segmentation Toolbox Based on PyTorch.
[ECCV2024] IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild
Official implementation of "CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization".
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
📹 A more flexible framework that can generate videos at any resolution and creates videos from images.