-
zhejiang university
Lists (4)
Sort Name ascending (A-Z)
Stars
Dexbotic: Open-Source Vision-Language-Action Toolbox
Official Implementation of TIP 2025: "PLGS: Robust Panoptic Lifting with 3D Gaussian Splatting"
RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning.
[IROS 2024] Incrementally Building Room-Scale Language-Embedded Gaussian Splats (LEGS) with a Mobile Robot
The code for PixelRefer & VideoRefer
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
starVLA: A Lego-like Codebase for Vision-Language-Action Model Developing
🔊 Text-Prompted Generative Audio Model
[NeurIPS 2025 Spotlight] Official implementation of the SIU3R: Simultaneous Scene Understanding and 3D Reconstruction Beyond Feature Alignment
🔥 SpatialVLA: a spatial-enhanced vision-language-action model that is trained on 1.1 Million real robot episodes. Accepted at RSS 2025.
Survey: https://arxiv.org/pdf/2507.20198
Official implementation for "JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation"
Official implementation of Continuous 3D Perception Model with Persistent State
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
VGGT 3D Vision Agent optimized for Apple Silicon with Metal Performance Shaders
🥢像老乡鸡🐔那样做饭。主要部分于2024年完工,非老乡鸡官方仓库。文字来自《老乡鸡菜品溯源报告》,并做归纳、编辑与整理。CookLikeHOC.
[TRO 2025] OmniMap: A General Mapping Framework Integrating Optics, Geometry, and Semantics
[SIGGRAPH ASIA 2024] Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane
Official implementation of the paper: "StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling"
Wan: Open and Advanced Large-Scale Video Generative Models
This repository collects research papers of large Vision Language Models in Autonomous driving and Intelligent Transportation System. The repository will be continuously updated to track the lates…