Highlights
- Pro
Stars
Fast and memory-efficient exact attention
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
Labs for MIT 6.S184/6.S975, IAP 2025/2026
An extremely fast Python package and project manager, written in Rust.
Code for Generalizable Articulated Object Reconstruction from Casually Captured RGBD Videos
[NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards
Get the coordinates of clicks on an image in your streamlit app
Rich is a Python library for rich text and beautiful formatting in the terminal.
The simplest, fastest repository for training/finetuning small-sized VLMs.
Minimalistic 4D-parallelism distributed training framework for education purpose
A high-throughput and memory-efficient inference and serving engine for LLMs
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
NVIDIA Isaac GR00T N1.6 - A Foundation Model for Generalist Robots.
A fork to add multimodal model training to open-r1
Hackable and optimized Transformers building blocks, supporting a composable construction.
Train transformer language models with reinforcement learning.
Fully open reproduction of DeepSeek-R1
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…
Command Line Interactive and Scriptable Application to access MEGA