mllm
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/sp…
Janus-Series: Unified Multimodal Understanding and Generation Models
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels
✨✨Latest Advances on Multimodal Large Language Models
Awesome Unified Multimodal Models
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
AgentCPM-GUI: An on-device GUI agent for operating Android apps, enhancing reasoning ability with reinforcement fine-tuning for efficient task execution.
[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.
SVG Differentiable Rendering: Generating vector graphics using neural networks. Support: text-to-SVG, Image-to-SVG, SVG Editing.
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".
[NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Interleaving Reasoning: Next-Generation Reasoning Systems for AGI
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.
Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
🔥 Official impl. of "DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction"
Calligrapher: Freestyle Text Image Customization
Official Implementation of Paper Transfer between Modalities with MetaQueries
Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework