Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.
Our first fully AI generated deep learning system
Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.
official repo of paper for "CamI2V: Camera-Controlled Image-to-Video Diffusion Model"
MedRAX: Medical Reasoning Agent for Chest X-ray - ICML 2025
Voyager is an interactive RGBD video generation model conditioned on camera input, and supports real-time 3D reconstruction.
A python parametric CAD scripting framework based on OCCT
A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts Interested in World Modeling.
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
[ECCV 2024] SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM
GigaWorld-0: World Models as Data Engine to Empower Embodied AI
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
[ICLR 2026] Unified Vision-Language-Action Model
Official PyTorch implementation of StyleGAN3
Equivariant Steerable CNNs Library for Pytorch https://quva-lab.github.io/escnn/
Unfied World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets
A toolbox for real-to-sim reconstruction and robotic simulation
Fast and Universal 3D reconstruction model for versatile tasks
MotionStream: Real-Time Video Generation with Interactive Motion Controls
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)
[IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers
[ArXiv 2025] A survey about controllable video generation: This repo is the official awesome of "Controllable video generation: A survey"
[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.
Benchmarking Knowledge Transfer in Lifelong Robot Learning
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning