Stars
Causal video-action world model for generalist robot control
LAP: Language-Action Pre-Training Enables Zero-Shot Cross Embodiment Transfer
Understanding R1-Zero-Like Training: A Critical Perspective
Official repository for the paper "Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation"
Reinforcement Learning via Self-Distillation (SDPO)
[ICLR 2026] RoboInter: A Holistic Intermediate Representation Suite Towards Robotic Manipulation
Code for kai0, including training, inference and data collection.
Code to pretrain, fine-tune, and evaluate DreamZero and run sim & real-world evals
Qwen3-Coder is the code version of Qwen3, the large language model series developed by Qwen team.
Mega Scale Multimodal DataPipeline for SOTA Foundation Models
MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search tools.
Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video.
The official repository for the paper "Real-world Reinforcement Learning from Suboptimal Interventions”.
Code, data and weights for the paper **What drives success in physical planning with Joint-Embedding Predictive World Models?**
Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
Retargeting of whole-body human motion to humanoid robots for dexterous manipulation of articulated objects.
Official PyTorch implementation of One-Minute Video Generation with Test-Time Training
Describe Anything, Anywhere, at Any Moment (DAAAM), a novel approach to real-time, large-scale, spatio-temporal memory
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related webs…
A paper list for spatial reasoning