Stars
VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control
🔥🔥🔥[AAAI 2026 Oral] Official Implementation of Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding
The official implementation of WaveNet-VNNs for Active Noise Control (ANC), a fully causal solution.
TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
Team Comet's 2025 BEHAVIOR Challenge Codebase
SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
Official Implementations for Paper - MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues
Official implementation of "UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing"
Official Implementations for Paper - HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives
[Preprint 2025] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
Wan: Open and Advanced Large-Scale Video Generative Models
Wan: Open and Advanced Large-Scale Video Generative Models
Enjoy the magic of Diffusion models!
the official repo for "D-AR: Diffusion via Autoregressive Models"
Official repository for the paper "Orientation Matters: Making 3D Generative Models Orientation-Aligned" (NeurIPS 2025)
Calligrapher: Freestyle Text Image Customization
[ICML 2025] Differentiable Solver Search for Fast Diffusion Sampling
DMM: Building a Versatile Image Generation Model via Distillation-Based Model Merging
[CVPR'25] Official implementation for paper - Contextual AD Narration with Interleaved Multimodal Sequence
GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
[NeurIPS2025 Spotlight 🔥 ] Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface"