-
OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels, CVPR 2025. [Paper | Code]
-
Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space, CVPR 2025. [Paper | Project | Code]
-
3D Student Splatting and Scooping, CVPR 2025. [Paper | Code]
-
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models, CVPR 2025. [Paper | Project | Code]
-
Reconstructing Humans with a Biomechanically Accurate Skeleton, CVPR 2025. [Paper | Project | Code]
-
Multi-view Reconstruction via SfM-guided Monocular Depth Estimation, CVPR 2025. [Paper | Project | Code]
-
Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models, CVPR 2025. [Paper | Project | Code]
-
CustAny: Customizing Anything from A Single Example, CVPR 2025. [Paper | Project | Code]
-
VGGT: Visual Geometry Grounded Transformer, CVPR 2025. [Paper | Project | Code] πππ
-
Navigation World Models, CVPR 2025. [Paper | Project | Code]
-
MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos, CVPR 2025. [Paper | Project | Code]
-
Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos, CVPR 2025. [Paper | Project | Code]
-
FoundationStereo: Zero-Shot Stereo Matching, CVPR 2025. [Paper | Project | Code]
-
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models, CVPR 2025. [Paper | Code]
-
The PanAf-FGBG Dataset: Understanding the Impact of Backgrounds in Wildlife Behaviour Recognition, CVPR 2025. [Paper | Project]
-
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis, CVPR 2025. [Paper | Project | Code]
-
Improving Diffusion Inverse Problem Solving with Decoupled Noise Annealing, CVPR 2025. [Paper | Project | Code]
-
MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds, CVPR 2025. [Paper | Project | Code]
-
TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization, CVPR 2025. [Paper | Project | Code]
-
Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models, CVPR 2025. [Paper | Code]
-
Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models, CVPR 2025. [Paper | Project | Code]
-
DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models, CVPR 2025. [Paper | Project | Code] ]
-
FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video, CVPR 2025. [Paper | Project | Code]
-
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation, CVPR 2025. [Paper | Project | Code]
-
RandAR: Decoder-only Autoregressive Visual Generation in Random Orders, CVPR 2025. [Paper | Project | Code]
-
Improving Diffusion Inverse Problem Solving with Decoupled Noise Annealing, CVPR 2025. [Paper | Project | Code]
-
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea, CVPR 2025. [Paper | Project | Code]
-
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection, CVPR 2025. [Paper | Code]
-
SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing, CVPR 2025. [Paper | Project | Code]
-
OPA-DPO: Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key, CVPR 2025. [Paper | Project | Code]
-
Minority-Focused Text-to-Image Generation via Prompt Optimization, CVPR 2025. [Paper | Code]
-
Autoregressive Distillation of Diffusion Transformers, CVPR 2025. [Paper | Code]
-
Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks, CVPR 2025. [Paper | Code]
-
TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion, CVPR 2025. [Paper | Code]
-
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models, CVPR 2025. [Paper | Blog | Code]
-
CUT3R: Continuous 3D Perception Model with Continuous 3D Perception Model with Persistent State State, CVPR 2025. [Paper | Project | Code]
-
Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key, CVPR 2025. [Paper | Project | Code]
-
Birth and Death of a Rose, CVPR 2025. [Paper | Project | Code]
-
Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content, CVPR 2025. [Paper | Code]
-
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector, CVPR 2025. [Paper | Code]
-
Exploring CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation, CVPR 2025. [Paper | Code]
-
LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models, CVPR 2025. [Paper]
-
FedSPA: Generalizable Federated Graph Learning under Homophily Heterogeneity, CVPR 2025. [Paper]
-
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces, CVPR 2025. [Paper | Project | Code]
-
Video-XL Family: Efficient VLMs for Extremely Long Video Understanding, CVPR 2025. [Paper| Code]
-
Neural Inverse Rendering from Propagating Light, CVPR 2025. [Paper | Project | Code] πππ
-
MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision, CVPR 2025. [Paper | Project | Code]
-
Motion Prompting: Controlling Video Generation with Motion Trajectories, CVPR 2025. [Paper | Project]
-
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise, CVPR 2025. [Paper | Project | Code]
-
LookingGlass: Generative Anamorphoses via Laplacian Pyramid Warping, CVPR 2025. [Paper | Project]
-
LibraGrad: Balancing Gradient Flow for Universally Better Vision Transformer Attributions, CVPR 2025. [Paper | Code]
-
Do We Always Need the Simplicity Bias? Looking for Optimal Inductive Biases in the Wild, CVPR 2025. [Paper | Project]
-
CleanDIFT: Diffusion Features without Noise, CVPR 2025. [Paper | Project | Code]
-
Towards Explicit Geometry-Reflectance Collaboration for Generalized LiDAR Segmentation in Adverse Weather, CVPR 2025. [Paper]
-
DiffFNO: Diffusion Fourier Neural Operator, CVPR 2025. [Paper | Project]
-
Semi-Supervised State-Space Model with Dynamic Stacking Filter for Real-World Video Deraining, CVPR 2025. [Paper]
-
CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner, CVPR 2025. [Paper | Project | Code]
-
Reanimating Images using Neural Representations of Dynamic Stimuli, CVPR 2025. [Paper]
-
EgoLM: Multi-Modal Language Model of Egocentric Motions, CVPR 2025. [Paper | Project]
-
MEGA: Masked Generative Autoencoder for Human Mesh Recovery, CVPR 2025. [Paper]
-
Descriptor-In-Pixel : Point-Feature Tracking for Pixel Processor Arraysr, CVPR 2025. [Project]
-
Temporally Consistent Object-Centric Learning by Contrasting Slots, CVPR 2025. [Paper | Project | Code]
-
Temporal Alignment-Free Video Matching for Few-shot Action Recognition, CVPR 2025. [Paper | Project | Code]
-
One Category One Prompt: Dataset Distillation using Diffusion Models, CVPR 2025. [Paper | Code]
-
IceDiff: High Resolution and High-Quality Sea Ice Forecasting with Generative Diffusion Prior, CVPR 2025. [Paper]
-
Efficient Test-time Adaptive Object Detection via Sensitivity-Guided Pruning, CVPR 2025. [Paper]
-
Keep the Balance: A Parameter-Efficient Symmetrical Framework for RGB+X Semantic Segmentation, CVPR 2025. [Paper]
-
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models, CVPR 2025. [Paper]
-
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons, CVPR 2025. [Paper]
-
Language-Guided Image Tokenization for Generation, CVPR 2025. [Paper | Project]
-
DreamRelation: Bridging Customization and Relation Generation, CVPR 2025. [Paper | Project | Code]
-
GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skills, CVPR 2025. [Paper | Project | Code]
-
Viewpoint Rosetta Stone: Unlocking Unpaired Ego-Exo Videos for View-invariant Representation Learning, CVPR 2025. [Project]
-
DORNet: A Degradation Oriented and Regularized Network for Blind Depth Super-Resolution, CVPR 2025. [Paper |Code]
-
Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World, CVPR 2025. [Paper|Code]
-
Learned Binocular-Encoding Optics for RGBD Imaging Using Joint Stereo and Focus Cues, CVPR 2025.[Project]
-
Camera resection from known line pencils and a radially distorted scanline, CVPR 2025. [Code]
-
Opportunistic Single-Photon Time of Flight, CVPR 2025. [Paper]
-
DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models, CVPR 2025. [Paper]
-
DORNet: A Degradation Oriented and Regularized Network for Blind Depth Super-Resolution, CVPR 2025. [Paper | Code]
-
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming, CVPR 2025. [Paper]
-
Geometric Knowledge-Guided Localized Global Distribution Alignment for Federated Learning, CVPR 2025. [Paper | Code]
-
Enhancing Diversity for Data-free Quantization, CVPR 2025. [Paper]
-
TopoCellGen: Generating Histopathology Cell Topology with a Diffusion Model, CVPR 2025. [Paper | Code]
-
Enhancing SAM with Efficient Prompting and Preference Optimization for Semi-supervised Medical Image Segmentation, CVPR 2025. [Paper]
-
Time of the Flight of the Gaussians: Fast and Accurate Dynamic Time-of-Flight Radiance Field, CVPR 2025. [Paper | Project | Code]
-
Zero-Shot Monocular Scene Flow Estimation in the Wild, CVPR 2025. [Paper | Project]
-
DNF: Unconditional 4D Generation with Dictionary-based Neural Fields, CVPR 2025. [Paper | Project | Code]
-
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models, CVPR 2025. [Paper | Project]
-
Effective SAM Combination for Open-Vocabulary Semantic Segmentation, CVPR 2025. [Paper]
-
Removing Reflections from RAW Photos, CVPR 2025. [Paper | Project]
-
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens, CVPR 2025. [Paper | Project | Code]
-
Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding, CVPR 2025. [Paper | Project | Code]
-
Towards Vision Language Models For Extra-Long Video Understanding, CVPR 2025. [Paper | Code]
-
SEAL: Semantic Attention Learning for Long Video Representation, CVPR 2025. [Paper]
-
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval, CVPR 2025. [Paper]
We would like to express our gratitude for the code repository provided in cvpr25_oral_gpu_info.