-
NVIDIA Research
- Santa Clara
- https://mli0603.github.io/
Stars
HunyuanVideo: A Systematic Framework For Large Video Generation Model
HunyuanVideo-1.5: A leading lightweight video generation model
Decoder Only Transformer Policy for Behavioral Cloning
LW-BenchHub is a unified benchmark hub built on Isaac Lab–Arena for embodied AI, providing consistent interfaces, realistic environments, multi-robot support, and large-scale evaluation. It include…
Team Comet's 2025 BEHAVIOR Challenge Codebase
Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video.
Cosmos-Transfer2.5, built on top of Cosmos-Predict2.5, produces high-quality world simulations conditioned on multiple spatial control inputs.
The offical Implementation of "Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model"
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
Train transformer language models with reinforcement learning.
Web-based 3D visualization + Python
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.
Cosmos-RL is a flexible and scalable Reinforcement Learning framework specialized for Physical AI applications.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Efficient Triton Kernels for LLM Training
An open-source AI agent that brings the power of Gemini directly into your terminal.
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
Enjoy the magic of Diffusion models!
Official repository for BrickGPT, the first approach for generating physically stable toy brick models from text prompts.
[ICCV 2025] 🔥🔥 UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioning
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Wan: Open and Advanced Large-Scale Video Generative Models
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.