Stars
StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing
Official Implementation of Paper: WMPO: World Model-based Policy Optimization for Vision-Language-Action Models
[ICLR 2026] SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
A curated list of recent robot learning papers incorporating diffusion models for robotics tasks.
Official repository for "iVideoGPT: Interactive VideoGPTs are Scalable World Models" (NeurIPS 2024), https://arxiv.org/abs/2405.15223
CleanDiffuser: An Easy-to-use Modularized Library for Diffusion Models in Decision Making
Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence
This repository implements a Best-of-N (BoN) strategy for inference-aware fine-tuning of large language models. The system supports multiple leading LLM providers and includes comprehensive testing…
Open-Sora: Democratizing Efficient Video Production for All
Scalable and memory-optimized training of diffusion models
hilookas / SimplerEnv
Forked from simpler-env/SimplerEnvEvaluating FSD on SimplerEnv
DelinQu / SimplerEnv-OpenVLA
Forked from simpler-env/SimplerEnvEvaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo, and OpenVLA) in simulation under common setups (e.g., Google Robot, WidowX+Bridge)
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Official implementation of our paper: "Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing" (ICML 2025)
📚 Collection of awesome generation acceleration resources.
FORA introduces simple yet effective caching mechanism in Diffusion Transformer Architecture for faster inference sampling.
High-speed Large Language Model Serving for Local Deployment
world modeling challenge for humanoid robots
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
Coherent Video Inpainting Using Optical Flow-Guided Efficient Diffusion
Two Birds, One Stone: A Unified Framework for Joint Learning of Image and Video Style Transfers(ICCV2023)