Stars
A curated list of research papers, resources, and advancements on Diffusion Cache and related efficient diffusion model acceleration techniques.
Krea Realtime 14B. An open-source realtime AI video model.
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
Trace Anything: Representing Any Video in 4D via Trajectory Fields
Follow-Your-Preference: Towards Preference-Aligned Image Inpainting
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation
The official Pytorch code for paper "ContextFlow: Training-Free Video Object Editing via Adaptive Context Enrichment"
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Code for ICCVW paper, UniPaint: Unified Space-time Video Inpainting via Mixture-of-Experts
4DNeX: Feed-Forward 4D Generative Modeling Made Easy
Implementation of "S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models"
[CVPR 2024 Highlight] OpenESS: Event-Based Semantic Scene Understanding with Open Vocabularies
Cosmos-Transfer1-DiffusionRenderer: High-quality video de-lighting and re-lighting based on Cosmos video diffusion framework
Implementation of CamTrol: Training-free Camera Control for Video Generation
πΉ A more flexible framework that can generate videos at any resolution and creates videos from images.
[ICCV'25] Official implementation of "Reangle-A-Video: 4D Video Generation as Video-to-Video Translation"
Official Implementation of "Instance Segmentation of Scene Sketches Using Natural Image Priors" (SIGGRAPH 2025)
ViPE: Video Pose Engine for Geometric 3D Perception
Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation [Siggraph Asian 2025]
[ArXiv 2025] Follow-Your-Shape: This repo is the official implementation of "Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control"
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
Code of Ο^3: Permutation-Equivariant Visual Geometry Learning
Wan: Open and Advanced Large-Scale Video Generative Models
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
[ICCV 2025] SpatialTrackerV2: 3D Point Tracking Made Easy
Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.
[NeurIPS' 2025] JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
Calligrapher: Freestyle Text Image Customization