Stars
Vision-and-Language Navigation in Continuous Environments using Habitat
Uncertainty-aware Latent Safety Filters for Avoiding Out-of-Distribution Failures (CoRL 2025)
A PyTorch extension that facilitates decentralized data parallel training. [ICLR'25] From Promise to Practice: Realizing High-performance Decentralized Training.
[ICLR'25] From Promise to Practice: Realizing High-performance Decentralized Training
[ICCV 2025] Official code of "ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation"
A Unified Framework for scalable Vehicle Trajectory Prediction, ECCV 2024
[CVPR 2024] MAPLM: A Large-Scale Vision-Language Dataset for Map and Traffic Scene Understanding
benchmarks of some time series forecasting methods
The open source implementation of 'Offline Tracking with Object Permanence', which aims to recover the occluded vehicle trajectories and reduce the identity switches caused by occlusions.
[ICCV2023] Official Implementation of "UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View Representation"
[ICCV 2023] INT2: Interactive Trajectory Prediction at Intersections
A PyTorch Library for Multi-Task Learning
[CVPR 2023] Query-Centric Trajectory Prediction
[ICRA'24] DeFlow: Decoder of Scene Flow Network in Autonomous Driving
[CVPR2024] NeuRAD: Neural Rendering for Autonomous Driving
We extend Segment Anything to 3D perception by combining it with VoxelNeXt.
On the Road with GPT-4V(ision): Explorations of Utilizing Visual-Language Model as Autonomous Driving Agent
[arXiv 2023] Set-of-Mark Prompting for GPT-4V and LMMs
Use Grounding DINO, Segment Anything, and GPT-4V to label images with segmentation masks for use in training smaller, fine-tuned models.
A curated list of awesome LLM/VLM/VLA for Autonomous Driving(LLM4AD) resources (continually updated)
[ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Code release for "Learning Video Representations from Large Language Models"
(TPAMI 2024) A Survey on Open Vocabulary Learning
[ICCV 2023 Oral] Game-theoretic modeling and learning of Transformer-based interactive prediction and planning
Patchwork++: Fast and robust ground segmentation method for 3D LiDAR scans. @ IROS'22