Stars
A professional cross-platform SSH/Sftp/Shell/Telnet/Tmux/Serial terminal.
Cyberduck is a libre FTP, SFTP, WebDAV, Amazon S3, Backblaze B2, Microsoft Azure & OneDrive and OpenStack Swift file transfer client for Mac and Windows.
MiniMax-M2, a Mini model built for Max coding & agentic workflows.
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
NipaPlay-Reload 是一个现代化的跨平台本地视频播放器,支持 Windows、macOS、Linux、Android 和 iOS。集成了弹幕显示、多格式字幕支持、多音频轨道切换,新番查看等功能,支持挂载Emby/Jellyfin媒体库。采用 Flutter 开发,提供统一的用户体验。
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
rCM: SOTA Diffusion Distillation & Few-Step Video Generation
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
Official implementation for SSDD Single-Step Diffusion Decoder for Efficient Image Tokenization.
HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization
A modern GUI client based on Tauri, designed to run in Windows, macOS and Linux for tailored proxy experience
HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation
Lumina-DiMOO - An Open-Sourced Multi-Modal Large Diffusion Language Model
[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
Reference PyTorch implementation and models for DINOv3
[arxiv 25] Turbo-VAED: Fast and Stable Transfer of Video-VAEs to Mobile Devices
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Renderer for the harmony response format to be used with gpt-oss
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.
PyTorch code and models for VJEPA2 self-supervised learning from video.
A simple way to keep track of an Exponential Moving Average (EMA) version of your Pytorch model
Official Implementation of Paper Transfer between Modalities with MetaQueries