Stars
MiniMax M2.1, a SOTA model for real-world dev & agents.
Pixio: a capable vision encoder dedicated to dense prediction, simply by pixel reconstruction
Towards Scalable Pre-training of Visual Tokenizers for Generation
A high-throughput and memory-efficient inference and serving engine for LLMs
Official PyTorch Code for "OmniAID: Decoupling Semantic and Artifacts for Universal AI-Generated Image Detection in the Wild".
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Paper Debugger is the best overleaf companion
A minimal PyTorch re-implementation of Qwen3 VL with a fancy CLI
HunyuanVideo-1.5: A leading lightweight video generation model
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
Official Implementation of "MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation"
The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
Triton implementation of FlashAttention2 that adds Custom Masks.
[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Official implementation of paper "VMoBA: Mixture-of-Block Attention for Video Diffusion Models"
[NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation
Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video.
A professional cross-platform SSH/Sftp/Shell/Telnet/Tmux/Serial terminal.
Cyberduck is a libre FTP, SFTP, WebDAV, Amazon S3, Backblaze B2, Microsoft Azure & OneDrive and OpenStack Swift file transfer client for Mac and Windows.
MiniMax-M2, a model built for Max coding & agentic workflows.
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
NipaPlay-Reload 是一个现代化的跨平台本地视频播放器,支持 Windows、macOS、Linux、Android 和 iOS。集成了弹幕显示、多格式字幕支持、多音频轨道切换,新番查看等功能,支持挂载Emby/Jellyfin媒体库。采用 Flutter 开发,提供统一的用户体验。
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"