Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View Xiaolong-RRL's full-sized avatar
🏠
Working from home
🏠
Working from home

Block or report Xiaolong-RRL

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Jupyter Notebook 23 1 Updated Apr 28, 2025

starVLA: A Lego-like Codebase for Vision-Language-Action Model Developing

Python 257 12 Updated Oct 27, 2025

Contexts Optical Compression

Python 18,173 1,188 Updated Oct 25, 2025

Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.

Python 764 62 Updated Oct 1, 2025

MiMo-VL

571 27 Updated Aug 21, 2025

[NeurIPS 2025 spotlight] Official implementation for "FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving"

Python 415 15 Updated Sep 28, 2025

Fully Open Framework for Democratized Multimodal Training

Python 585 40 Updated Oct 21, 2025

[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"

Python 272 24 Updated Dec 14, 2024

SpatialVID: A Large-Scale Video Dataset with Spatial Annotations

Python 397 12 Updated Oct 22, 2025

Spatial Reasoning with Vision-Language Models

Python 20 Updated Oct 20, 2025

Accessible large language models via k-bit quantization for PyTorch.

Python 7,691 791 Updated Oct 27, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 10,813 1,170 Updated Oct 12, 2025

Code of π^3: Permutation-Equivariant Visual Geometry Learning

Python 1,314 59 Updated Sep 10, 2025

[ICCV 2025] A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D World

Python 333 17 Updated Oct 21, 2025

Paper list in the survey: A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

299 8 Updated Jul 3, 2025

ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving

Python 315 25 Updated Oct 20, 2025

[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling

Python 4,056 319 Updated Sep 26, 2025

A curated list of awesome papers for reconstructing 4D spatial intelligence from video. (arXiv 2507.21045)

347 18 Updated Oct 28, 2025

Official implementation of Continuous 3D Perception Model with Persistent State

Python 1,151 61 Updated Aug 27, 2025

VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction

Python 282 20 Updated Sep 1, 2025

[ICCV 2025] Official code of "ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation"

Python 472 41 Updated Oct 9, 2025

Code for Streaming 4D Visual Geometry Transformer

Python 684 28 Updated Oct 27, 2025

[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer

Python 11,426 1,168 Updated Oct 11, 2025

Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence

Python 1,215 85 Updated Jan 31, 2025

Adapting VLMs to Bench2Drive.

Python 162 20 Updated Oct 12, 2025

The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'

Jupyter Notebook 148 2 Updated Oct 9, 2025

[CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.

Python 171 9 Updated Jun 4, 2025

🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning

Python 18,774 2,886 Updated Oct 28, 2025

[CVPR 2025, Spotlight] SimLingo (CarLLava): Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment

Python 242 30 Updated Aug 25, 2025
Next