Stars
Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
verl: Volcano Engine Reinforcement Learning for LLMs
MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
[NeurIPS 2025 DB] OneIG-Bench is a meticulously designed comprehensive benchmark framework for fine-grained evaluation of T2I models across multiple dimensions, including subject-element alignment,…
A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
GenEval: An object-focused framework for evaluating text-to-image alignment
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
An official implementation of EvoSearch: Scaling Image and Video Generation via Test-Time Evolutionary Search
⚡ InstaFlow! One-Step Stable Diffusion with Rectified Flow (ICLR 2024)
Official Implementation of Rectified Flow (ICLR2023 Spotlight)
Doodling our way to AGI ✏️ 🖼️ 🧠
The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning
Awesome Unified Multimodal Models
一个简洁优雅的词典翻译 macOS App。开箱即用,支持离线 OCR 识别,支持有道词典,🍎 苹果系统词典,🍎 苹果系统翻译,OpenAI,Gemini,DeepL,Google,Bing,腾讯,百度,阿里,小牛,彩云和火山翻译。A concise and elegant Dictionary and Translator macOS App for looking up words an…
SGLang is a fast serving framework for large language models and vision language models.
A framework for few-shot evaluation of language models.
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI, derived from Ling.
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
A collection of multimodal reasoning papers, codes, datasets, benchmarks and resources.