Stars
Code for RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration
MEt3R: Measuring Multi-View Consistency in Generated Images
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
[CVPR2025] MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model
Official implementation of "AM-Adapter: Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis in-the-Wild" (ICCV 2025)
Official implementation of "URECA : Unique Region Caption Anything"
Official implementation of "S⁴M: Boosting Semi-Supervised Instance Segmentation with SAM" (ICCV 2025)
Official Implementation of "Video Camera Trajectory Editing with Generative Rendering from Estimated Geometry"
Official implementation of "InterRVOS: Interaction-aware Referring Video Object Segmentation".
Official Implementation of "Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation"
Official implementation of "V-Warper: Appearance-Consistent Video Diffusion Personalization via Value Warping"
Official implementation of "Where and How to Perturb: On the Design of Perturbation Guidance in Diffusion and Flow Models"
[NeurIPS'25] Official implementation of "D^2USt3R: Enhancing 3D Reconstruction with 4D Pointmaps for Dynamic Scenes"
[NeurIPS'25] Official implementation of "Emergent Temporal Correspondences from Video Diffusion Models"
[CVPR'26] Official implementation of "C3G: Learning Compact 3D Representations with 2K Gaussians"
Official implementation of "Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression"
Official implementation of "CAMEO: Correspondence-Attention Alignment for Multi-View Diffusion Models"
Official implementation of "MV-TAP: Tracking Any Point in Multi-View Videos"
Official implementation of "AnthroTAP: Learning Point Tracking with Real-World Motion"
Official implementation of "Emergent Outlier View Rejection in Visual Geometry Grounded Transformers"
Official implementation of "Unified Diffusion Transformer for High-Fidelity Text-Aware Image Restoration"
Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing
[ICLR'26] Official implementation of "3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation"
Official implementation of "MATRIX: Mask Track Alignment for Interaction-aware Video Generation"
🔥🔥[NeurIPS2025]Exploring and mitigating semantic hallucinations in scene text perception and reasoning