Thanks to visit codestin.com
Credit goes to github.com

pyh-129

Follow

ppTanya pyh-129

Follow

Student of Wuhan University

5 followers · 14 following

Wuhan, Hubei Province, China
https://www.whu.edu.cn/

Highlights

Pro

Stars

mllm

23 repositories

ttengwang / Caption-Anything

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/sp…

Python 1,770 102 Updated Aug 29, 2023

deepseek-ai / Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 17,642 2,232 Updated Feb 1, 2025

showlab / Awesome-MLLM-Hallucination

📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

930 39 Updated Sep 27, 2025

threegold116 / Awesome-Omni-MLLMs

This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels

78 2 Updated Nov 10, 2025

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

17,031 1,095 Updated Dec 12, 2025

AIDC-AI / Awesome-Unified-Multimodal-Models

Awesome Unified Multimodal Models

976 31 Updated Aug 17, 2025

ByteDance-Seed / Seed1.5-VL

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,516 58 Updated Jun 14, 2025

OpenBMB / AgentCPM-GUI

AgentCPM-GUI: An on-device GUI agent for operating Android apps, enhancing reasoning ability with reinforcement fine-tuning for efficient task execution.

Python 1,152 104 Updated Jun 14, 2025

Code-kunkun / LamRA

[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

Python 171 9 Updated Jul 7, 2025

zhaochen0110 / OpenThinkIMG

OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.

Jupyter Notebook 334 7 Updated Jun 1, 2025

ByteDance-Seed / Bagel

Open-source unified multimodal model

Python 5,489 481 Updated Oct 27, 2025

ximinng / PyTorch-SVGRender

SVG Differentiable Rendering: Generating vector graphics using neural networks. Support: text-to-SVG, Image-to-SVG, SVG Editing.

Python 465 49 Updated Feb 25, 2025

PKU-YuanGroup / UniWorld

UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Python 821 25 Updated Nov 25, 2025

ByteVisionLab / TokenFlow

[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".

Python 412 6 Updated Aug 8, 2025

csuhan / Tar

[NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Python 192 6 Updated Sep 18, 2025

Osilly / Awesome-Interleaving-Reasoning

Interleaving Reasoning: Next-Generation Reasoning Systems for AGI

220 10 Updated Oct 17, 2025

TencentARC / TokLIP

TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation

Python 235 5 Updated Aug 18, 2025

inclusionAI / Ming

Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.

Jupyter Notebook 559 46 Updated Oct 30, 2025

Ephemeral182 / PosterCraft

Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework

Python 516 34 Updated Sep 23, 2025

ByteVisionLab / DetailFlow

🔥 Official impl. of "DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction"

Python 161 8 Updated Jul 10, 2025

EzioBy / Calligrapher

Calligrapher: Freestyle Text Image Customization

Python 295 22 Updated Sep 3, 2025

facebookresearch / metaquery

Official Implementation of Paper Transfer between Modalities with MetaQueries

Python 280 9 Updated Oct 12, 2025

MeiGen-AI / PosterCraft

Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework

Python 876 47 Updated Jul 1, 2025