Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View pyh-129's full-sized avatar

Highlights

  • Pro

Block or report pyh-129

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

mllm

23 repositories

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/sp…

Python 1,770 102 Updated Aug 29, 2023

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 17,642 2,232 Updated Feb 1, 2025

📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

930 39 Updated Sep 27, 2025

This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels

78 2 Updated Nov 10, 2025

✨✨Latest Advances on Multimodal Large Language Models

17,031 1,095 Updated Dec 12, 2025

Awesome Unified Multimodal Models

976 31 Updated Aug 17, 2025

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,516 58 Updated Jun 14, 2025

AgentCPM-GUI: An on-device GUI agent for operating Android apps, enhancing reasoning ability with reinforcement fine-tuning for efficient task execution.

Python 1,152 104 Updated Jun 14, 2025

[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

Python 171 9 Updated Jul 7, 2025

OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.

Jupyter Notebook 334 7 Updated Jun 1, 2025

Open-source unified multimodal model

Python 5,489 481 Updated Oct 27, 2025

SVG Differentiable Rendering: Generating vector graphics using neural networks. Support: text-to-SVG, Image-to-SVG, SVG Editing.

Python 465 49 Updated Feb 25, 2025

UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Python 821 25 Updated Nov 25, 2025

[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".

Python 412 6 Updated Aug 8, 2025

[NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Python 192 6 Updated Sep 18, 2025

Interleaving Reasoning: Next-Generation Reasoning Systems for AGI

220 10 Updated Oct 17, 2025

TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation

Python 235 5 Updated Aug 18, 2025

Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.

Jupyter Notebook 559 46 Updated Oct 30, 2025

Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework

Python 516 34 Updated Sep 23, 2025

🔥 Official impl. of "DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction"

Python 161 8 Updated Jul 10, 2025

Calligrapher: Freestyle Text Image Customization

Python 295 22 Updated Sep 3, 2025

Official Implementation of Paper Transfer between Modalities with MetaQueries

Python 280 9 Updated Oct 12, 2025

Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework

Python 876 47 Updated Jul 1, 2025