Lists (25)
Sort Name ascending (A-Z)
3DGS
AIGC
Animation
Calibration
Concept
DIBR
DigitalHuman
Fusion
GPT
ImageTask2D
Library
LLM
MeshProcess
NERF
ObjectGeneration
Reconstruction
Render
Robot
SceneGen
Survey
Tools
VideoGen
VideoInterpolation
VLA
WorldModel
Starred repositories
Code for the ICLR 2024 spotlight paper: "Learning to Act without Actions" (introducing Latent Action Policies)
A unified inference and post-training framework for accelerated video generation.
(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
Official repo for vidar and vidarc: video foundation model for robotics.
Lets make video diffusion practical!
Zotero is a free, easy-to-use tool to help you collect, organize, annotate, cite, and share your research sources.
InternVLA-A1: Unifying Understanding, Generation, and Action for Robotic Manipulation​
Nvidia GEAR Lab's initiative to solve the robotics data problem using world models
📹 A more flexible framework that can generate videos at any resolution and creates videos from images.
Wan: Open and Advanced Large-Scale Video Generative Models
Official code implementation of "Mitty: Diffusion-based Human-to-Robot Video Generation"
A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.
Official code of Motus: A Unified Latent Action World Model
Cosmos-Transfer2.5, built on top of Cosmos-Predict2.5, produces high-quality world simulations conditioned on multiple spatial control inputs.
Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.
Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video.
Sharp Monocular View Synthesis in Less Than a Second
A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts Interested in World Modeling.
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
Open-Sora: Democratizing Efficient Video Production for All
Light Image Video Generation Inference Framework
A curated list of recent diffusion models for video generation, editing, and various other applications.
Enjoy the magic of Diffusion models!
[ICLR 2025] LAPA: Latent Action Pretraining from Videos
Imitation learning algorithms with Co-training for Mobile ALOHA: ACT, Diffusion Policy, VINN
NVIDIA Isaac GR00T N1.6 - A Foundation Model for Generalist Robots.
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
LATTICE: Democratize High-Fidelity 3D Generation at Scale