Lists (25)
Sort Name ascending (A-Z)
3DGS
AIGC
Animation
Calibration
Concept
DIBR
DigitalHuman
Fusion
GPT
ImageTask2D
Library
LLM
MeshProcess
NERF
ObjectGeneration
Reconstruction
Render
Robot
SceneGen
Survey
Tools
VideoGen
VideoInterpolation
VLA
WorldModel
Starred repositories
Cosmos-Transfer2.5, built on top of Cosmos-Predict2.5, produces high-quality world simulations conditioned on multiple spatial control inputs.
Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.
Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video.
Sharp Monocular View Synthesis in Less Than a Second
A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts Interested in World Modeling.
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
Open-Sora: Democratizing Efficient Video Production for All
A curated list of recent diffusion models for video generation, editing, and various other applications.
Enjoy the magic of Diffusion models!
[ICLR 2025] LAPA: Latent Action Pretraining from Videos
Imitation learning algorithms with Co-training for Mobile ALOHA: ACT, Diffusion Policy, VINN
NVIDIA Isaac GR00T N1.6 - A Foundation Model for Generalist Robots.
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
LATTICE: Democratize High-Fidelity 3D Generation at Scale
VLA-0: Building State-of-the-Art VLAs with Zero Modification
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)
Dexbotic: Open-Source Vision-Language-Action Toolbox
GigaWorld-0: World Models as Data Engine to Empower Embodied AI
The offical Implementation of "Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model"
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing
Curated list of papers and resources focused on 3D Gaussian Splatting, intended to keep pace with the anticipated surge of research in the coming months.
A cross-platform, high performance renderer for Gaussian Splatting using Vulkan Compute. Supports Windows, Linux, macOS, iOS, and visionOS
Vulkan-based Gaussian Splatting viewer, and python binding