-
LZU / CUHKSZ / KAUST
- Shenzhen
-
16:23
(UTC +08:00) - https://01yzzyu.github.io/
- https://scholar.google.com/citations?hl=zh-CN&user=x2VGVvcAAAAJ
Lists (13)
Sort Name ascending (A-Z)
Stars
[ICML 2025 Oral] The official repository for the paper "Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark"
MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
Awesome Unified Multimodal Models
Official code implementation of the paper: QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation
A collection of awesome image inpainting studies.
[CVPR 2025] Video Narration as Vocabulary & Video as Long Document
Official eval code for ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation
Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
[NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
Unified Multimodal Model for image generation/editing/understanding
OmniGen2: Exploration to Advanced Multimodal Generation.
Janus-Series: Unified Multimodal Understanding and Generation Models
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics
[arXiv 2025] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
SGLang is a fast serving framework for large language models and vision language models.
📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.
[ACL 2025] The Role of Visual Modality in Multimodal Mathematical Reasoning: Challenges and Insights
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
MedEvalKit: A Unified Medical Evaluation Framework
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning