- Beijing
Highlights
- Pro
Stars
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
All Algorithms implemented in Python
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning.
Interactive Post-Training for Vision-Language-Action Models
A minimal implementation of DeepMind's Genie world model
PyTorch code and models for VJEPA2 self-supervised learning from video.
[IROS 2025 Award Finalist] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
[NeurIPS 2025] AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
WorldVLA: Towards Autoregressive Action World Model
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
Code and implementations for the paper "AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning" by Zhiheng Xi et al.
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Code of WinT3R: Window-Based Streaming Rrconstruction With Camera Token Pool
[ICCV 2025 ⭐highlight⭐] Implementation of VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory
AgentScope: Agent-Oriented Programming for Building LLM Applications
Minimalistic 4D-parallelism distributed training framework for education purpose
A repository consisting of paper/architecture replications of classic/SOTA AI/ML papers in pytorch
Machine Learning Engineering Open Book
A MemAgent framework that can be extrapolated to 3.5M, along with a training framework for RL training of any agent workflow.