-
The Chinises University of Hong Kong, Shenzhen
- China, Shenzhen
- https://yanx27.github.io/
Lists (4)
Sort Name ascending (A-Z)
Stars
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence
Drive-Pi0 and DriveMoE on End-to-end Autonomous Driving
[NeurIPS 2025] AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
[ICCV 2025 & ICCV 2025 RIWM Outstanding Paper] Aether: Geometric-Aware Unified World Modeling
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
(ICCV2025) End-to-End Driving with Online Trajectory Evaluation via BEV World Model
[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling
official code of *DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model*
Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving (AAAI-25)
Code for "DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT"
This is a collective repository for all 3DGS related progresses in research and industry world
A suite of image and video neural tokenizers
[NeurIPS 2024] SCube: Instant Large-Scale Scene Reconstruction using VoxSplats
[CVPR 2025] UniScene: Unified Occupancy-centric Driving Scene Generation
Code release for https://kovenyu.com/WonderWorld/
[ECCV2024 Oral🔥] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"
[CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
[ECCV 2024] Embodied Understanding of Driving Scenarios
mllm-npu: training multimodal large language models on Ascend NPUs