Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.

Python 764 62 Updated Oct 1, 2025

XiaomiMiMo / MiMo-VL

MiMo-VL

571 27 Updated Aug 21, 2025

MIV-XJTU / FSDrive

[NeurIPS 2025 spotlight] Official implementation for "FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving"

Python 415 15 Updated Sep 28, 2025

EvolvingLMMs-Lab / LLaVA-OneVision-1.5

Fully Open Framework for Democratized Multimodal Training

Python 585 40 Updated Oct 21, 2025

AnjieCheng / SpatialRGPT

[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"

Python 272 24 Updated Dec 14, 2024

NJU-3DV / SpatialVID

SpatialVID: A Large-Scale Video Dataset with Spatial Annotations

Python 397 12 Updated Oct 22, 2025

vbdi / Ego3D-Bench

Spatial Reasoning with Vision-Language Models

Python 20 Updated Oct 20, 2025

bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

Python 7,691 791 Updated Oct 27, 2025

Wan-Video / Wan2.2

Wan: Open and Advanced Large-Scale Video Generative Models

Python 10,813 1,170 Updated Oct 12, 2025

yyfz / Pi3

Code of π^3: Permutation-Equivariant Visual Geometry Learning

Python 1,314 59 Updated Sep 10, 2025

ZCMax / LLaVA-3D

[ICCV 2025] A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D World

Python 333 17 Updated Oct 21, 2025

Psi-Robot / Awesome-VLA-Papers

Paper list in the survey: A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

299 8 Updated Jul 3, 2025

xiaomi-research / recogdrive

ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving

Python 315 25 Updated Oct 20, 2025

manycore-research / SpatialLM

[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling

Python 4,056 319 Updated Sep 26, 2025

yukangcao / Awesome-4D-Spatial-Intelligence

A curated list of awesome papers for reconstructing 4D spatial intelligence from video. (arXiv 2507.21045)

347 18 Updated Oct 28, 2025

CUT3R / CUT3R

Official implementation of Continuous 3D Perception Model with Persistent State

Python 1,151 61 Updated Aug 27, 2025

VITA-Group / VLM-3R

VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction

Python 282 20 Updated Sep 1, 2025

xiaomi-mlab / Orion

[ICCV 2025] Official code of "ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation"

Python 472 41 Updated Oct 9, 2025

wzzheng / StreamVGGT

Code for Streaming 4D Visual Geometry Transformer

Python 684 28 Updated Oct 27, 2025

facebookresearch / vggt

[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer

Python 11,426 1,168 Updated Oct 11, 2025

allenzren / open-pi-zero

Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence

Python 1,215 85 Updated Jan 31, 2025

Thinklab-SJTU / Bench2Drive-VL

Adapting VLMs to Bench2Drive.

Python 162 20 Updated Oct 12, 2025

LaVi-Lab / VG-LLM

The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'

Jupyter Notebook 148 2 Updated Oct 9, 2025

LaVi-Lab / Video-3D-LLM

[CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.

Python 171 9 Updated Jun 4, 2025

huggingface / lerobot

🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning

Python 18,774 2,886 Updated Oct 28, 2025

RenzKa / simlingo

[CVPR 2025, Spotlight] SimLingo (CarLLava): Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment

Python 242 30 Updated Aug 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xiaolong Xiaolong-RRL

Achievements