-
Beijing Institute of Technology
- Beijing, China
-
07:19
(UTC +08:00) - https://sharpiless.github.io/ch
- https://scholar.google.com/citations?user=dRFMPg8AAAAJ&hl=zh-CN
Stars
Unifying 2D and 3D Vision-Language Understanding
[AAAI 2025]Text to Point Cloud Localization with Multi-Level Negative Contrastive Learning
🔮 UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning (NeurIPS 2025)
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
[ICCV'25] 3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection
[CVPR 2025] 3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer
[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.
[ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'
Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
😎 up-to-date & curated list of awesome 3D Visual Grounding papers, methods & resources.
[TPAMI 2025] ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis
[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling
[CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.
[ICCV 2025] A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D World
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
Implementation for HiPrune, a training-free visual token pruning method for VLM acceleration.
[ICCV 2025] IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation
The code repository of "MBQ: Modality-Balanced Quantization for Large Vision-Language Models"
[ICCV '25 Highlight] CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching
See through the Dark: Learning Illumination-affined Representations for Nighttime Occupancy Prediction
[NeurIPS 2025] Completeness-Aware Reconstruction Enhancement
[ICCV25] Official code for paper “RobuSTereo: Robust Zero-Shot Stereo Matching under Adverse Weather.”
[CVPR2025] Feat2GS: Probing Visual Foundation Models with Gaussian Splatting
Official implementation of our ICCV'25 paper "Learning Dense Feature Matching via Lifting Single 2D Image to 3D Space"
Code for RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion [3DV 2025]
[ICCV 2025] NuiScene: Exploring Efficient Generation of Unbounded Outdoor Scenes