Stars
Pointcept: Perceive the world with sparse points, a codebase for point cloud perception research. Latest works: Sonata (CVPR'25 Highlight), PTv3 (CVPR'24 Oral)
[CVPR25 Highlight] Official implementation of Fun3DU, a method for functional understanding and segmentation in 3D scenes
[CVPR2025] BFANet: Revisiting 3D Semantic Segmentation with Boundary Feature Analysis
Mask3D predicts accurate 3D semantic instances achieving state-of-the-art on ScanNet, ScanNet200, S3DIS and STPLS3D.
Toolbox for our GraspNet-1Billion dataset.
Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020)
[CVPR2024] OneFormer3D: One Transformer for Unified Point Cloud Segmentation
[CVPR 2025] Insightful Instance Features for 3D Instance Segmentation
[IROS 2025] Official code of ”HybridTM: Combining Transformer and Mamba for 3D Semantic Segmentation“
OpenPCDet Toolbox for LiDAR-based 3D Object Detection.
PARC: Physics-based Augmentation with Reinforcement Learning for Character Controllers
Virtual Sparse Convolution for Multimodal 3D Object Detection
[IEEE T-PAMI 2023] Awesome BEV perception research and cookbook for all level audience in autonomous diriving
[ICCV 2023] The first DETR model for monocular 3D object detection with depth-guided transformer
Towards a Generative 3D World Engine for Embodied Intelligence
Text2Room generates textured 3D meshes from a given text prompt using 2D text-to-image models (ICCV2023).
Synthetic VQA data generation code for SpatialReasoner.
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling
An open-source framework for training large multimodal models.
[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.
基于YOLOV5 的ROS2功能包,可以快速完成物体识别与位姿发布。 A ROS2 package based on YOLOV5 that enables fast object recognition and pose publishing.
[ECCV 2024 Best Paper Candidate & TPAMI 2025] PointLLM: Empowering Large Language Models to Understand Point Clouds
[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"
🔥 SpatialVLA: a spatial-enhanced vision-language-action model that is trained on 1.1 Million real robot episodes. Accepted at RSS 2025.