Stars
DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding
[ECCV 2024] Pytorch code for our ECCV'24 paper NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields
DN-Splatter + AGS-Mesh: Depth and Normal Priors for Gaussian Splatting
A single-GPU trainable unconditional mesh generative model
Revisiting and integrating latest progress in SfM, includes both neural methods and classic methods, to make a robust and efficient SfM system for both academia and industial community.
[CVPR 2025] Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering
[ICCV2025] LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds
🌟A curated list of DUSt3R-related papers and resources, tracking recent advancements using this geometric foundation model.
[ICCV 2025] Official code of DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
MambaOut: Do We Really Need Mamba for Vision? (CVPR 2025)
Official repository for "AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos" (CVPR 2025)
[CVPR 2025] Volumetric Surfaces: Representing Fuzzy Geometries with Layered Meshes
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.
✨✨Latest Advances on Multimodal Large Language Models
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
A curated list of deep learning resources for video-text retrieval.
[CVPR 2022] Code for "NeRF-Editing: Geometry Editing of Neural Radiance Fields"
GlORIE-SLAM: Globally Optimized RGB-only Implicit Encoding Point Cloud SLAM
a text-conditional diffusion probabilistic model capable of generating high fidelity audio.
[SIGGRAPH Asia 2023 (Technical Communications)] EasyVolcap: Accelerating Neural Volumetric Video Research
[CVPR 2025] LiDAR-RT: Gaussian-based Ray Tracing for Dynamic LiDAR Re-simulation
[CVPR 2025] StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models
[ICCV 2025, Oral] TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources