- Advances in Feed-Forward 3D Reconstruction and View Synthesis: A Survey, arXiv 2025. [Paper] [Website]
- DUSt3R: Geometric 3D Vision Made Easy, CVPR 2024. [Paper] [Code] [Website]
- Monst3r: A simple approach for estimating geometry in the presence of motion, ICLR 2025. [Paper] [Code] [Website]
- LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models, ICLR 2025. [Paper] [Website]
- (CUT3R) Continuous 3D Perception Model with Persistent State, CVPR 2025. [Paper] [Code] [Website]
- Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization, CVPR 2025. [Paper] [Code]
- DAS3R: Dynamics-Aware Gaussian Splatting for Static Scene Reconstruction, arXiv 2024. [Paper] [Code] [Website]
- MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion, 3DV 2025. [Paper] [Code]
- Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs, arXiv 2024. [Paper] [Project] [Code]
- SAB3R: Semantic-Augmented Backbone in 3D Reconstruction, arXiv 2024. [Paper]
- No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images, ICLR 2025. [Paper] [Code] [Website]
- Doppelgangers++: Improved Visual Disambiguation with Geometric 3D Features, CVPR 2025. [Paper] [Code] [Project]
- (Spann3R) 3D Reconstruction with Spatial Memory, 3DV 2025. [Paper] [Code] [Website]
- Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass, CVPR 2025. [Paper] [Website] [Code]
- InstantSplat: Unbounded Sparse-view Pose-free Gaussian Splatting in 40 Seconds, arXiv 2025. [Paper] [Website] [Code]
- SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction, arXiv 2024. [Paper] [Code]
- Align3R: Aligned Monocular Depth Estimation for Dynamic Videos, CVPR 2025. [Paper] [Code] [Website]
- (MASt3R) Grounding Image Matching in 3D with MASt3R, ECCV 2024. [Paper] [Code] [Website]
- VGGT: Visual Geometry Grounded Transformer, CVPR 2025. [Paper] [Code] [Website]
- E3D-Bench: A Benchmark for End-to-End 3D Geometric Foundation Models, arXiv 2025. [Paper] [Code] [Website]
- π^3: Scalable Permutation-Equivariant Visual Geometry Learning, arXiv 2025. [Paper] [Code] [Website]
- Dens3R: A Foundation Model for 3D Geometry Prediction, ICCV 2025. [Paper] [Code] [Website]
- LONG3R: Long Sequence Streaming 3D Reconstruction, ICCV 2025. [Paper] [Code] [Website]
- PanoSplatt3R: Leveraging Perspective Pretraining for Generalized Unposed Wide-Baseline Panorama Reconstruction, ICCV 2025. [Paper] [Code] [Website]
- Ov3R: Open-Vocabulary Semantic 3D Reconstruction from RGB Videos, arXiv 2025. [Paper]
- G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration, arXiv 2025. [Paper]
- ViPE: Video Pose Engine for 3D Geometric Perception, arXiv 2025. [Paper] [Code] [Website]
- FastVGGT: Training-Free Acceleration of Visual Geometry Transformer, arXiv 2025. [Paper]
- SAIL-Recon: Large SfM by Augmenting Scene Regression with Localization, arXiv 2025. [Paper] [Code] [Website]
- Streaming 4D Visual Geometry Transformer, arXiv 2025. [Paper] [Code] [Website]
- MapAnything: Universal Feed-Forward Metric 3D Reconstruction, arXiv 2025. [Paper] [Code] [Website]
- TTT3R: 3D Reconstruction as Test-Time Training, arXiv 2025. [Paper] [Code] [Website]
- Co-Me: Confidence Guided Token Merging for Visual Geometric Transformers, arXiv 2025. [Paper] [Code] [Website]
- MB3R: Accurate Feed-forward Metric-scale 3D Reconstruction with Backend, arXiv 2025. [Paper]
- VG3T: Visual Geometry Grounded Gaussian Transformer, arXiv 2025. [Paper]
- KV-Tracker: Real-Time Pose Tracking with Transformers, arXiv 2025. [Paper] [Website]
- Hier-slam++: Neuro-symbolic semantic slam with a hierarchically categorical gaussian splatting, arXiv 2025. [Paper]
- MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos, CVPR 2025. [Paper] [Website] [Code]
- SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos, CVPR 2025. [Paper] [Code]
- MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors, CVPR 2025. [Paper] [Website] [Code]
- VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold, arXiv 2025. [Paper] [Code]
- Outdoor Monocular SLAM with Global Scale-Consistent 3D Gaussian Pointmaps, ICCV 2025. [Paper] [Website] [Code]
- VGGT-Long: Chunk it, Loop it, Align it – Pushing VGGT’s Limits on Kilometer-scale Long RGB Sequences, arXiv 2025. [Paper] [Code]
- Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline, IROS, 2025. [Paper] [Code]
- 3D Foundation Model-Based Loop Closing for Decentralized Collaborative SLAM, RAL, 2025. [Paper]
- ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association, 3DV, 2026. [Paper] [Code]
- SLAM-Former: Putting SLAM into One Transformer, arXiv 2025. [Paper] [Website] [Code]
- MASt3R-Fusion: Integrating Feed-Forward Visual Model with IMU, GNSS for High-Functionality SLAM, arXiv, 2025. [Paper] [Code]
- GRS-SLAM3R: Real-Time Dense SLAM with Gated Recurrent State, arXiv, 2025. [Paper]
- EC3R-SLAM: Efficient and Consistent Monocular Dense SLAM with Feed-Forward 3D Reconstruction, arXiv, 2025. [Paper] [Website] [Code]
- Visual Odometry with Transformers, arXiv 2025. [Paper] [Website] [Code]
- ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation, arXiv, 2025. [Paper] [Website] [Code]
- MASt3R-GS: Bridging 3D Reconstruction Priors with Gaussian Splatting for Real-Time Dense SLAM, IROSw, 2025. [Paper]
- LiDAR-VGGT: Cross-Modal Coarse-to-Fine Fusion for Globally Consistent and Metric-Scale Dense Mapping, arXiv, 2025. [Paper]
- Building temporally coherent 3D maps with VGGT for memory-efficient Semantic SLAM, arXiv, 2025. [Paper]
- SING3R-SLAM: Submap-based Indoor Monocular Gaussian SLAM with 3D Reconstruction Prior, arXiv, 2025. [Paper]
- KM-ViPE: Online Tightly Coupled Vision-Language-Geometry Fusion for Open-Vocabulary Semantic SLAM, arXiv, 2025. [Paper] [Code]
- Dynamic Visual SLAM using a General 3D Prior, arXiv, 2025. [Paper] [Code]
- OpenMonoGS-SLAM: Monocular Gaussian Splatting SLAM with Open-set Semantics, arXiv, 2025. [Paper]
- Calib3R: A 3D Foundation Model for Multi-Camera to Robot Calibration and 3D Metric-Scaled Scene Reconstruction, arXiv 2025. [Paper]
- Multi-modal Loop Closure Detection with Foundation Models in Severely Unstructured Environments, arXiv 2025. [Paper]
- SpatialLM: Training Large Language Models for Structured Indoor Modeling, NIPS 2025. [Paper] [Website] [Code]
- Reloc-VGGT: Visual Re-localization with Geometry Grounded Transformer, arXiv 2025. [Paper] [Code]
- UniPR-3D: Towards Universal Visual Place Recognition with Visual Geometry Grounded Transformer, arXiv 2025. [Paper] [Code]