zwy-Giser

Weiyu Zhang zwy-Giser

A Ph.D. candidate at Peking University, focusing on pre-training and post-training of Multimodal LLM, Multimodal representation learning, and generative models.

11 followers · 17 following

Achievements

Stars

HorizonWind2004 / reconstruction-alignment

Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.

Python 289 10 Updated Oct 16, 2025

yblir / GLIP_detection

Python 13 Updated Mar 10, 2024

QwenLM / Qwen-Image

Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.

Python 5,759 311 Updated Sep 30, 2025

wusize / F-LMM

[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models

Python 104 1 Updated May 29, 2025

shikras / shikra

Python 797 46 Updated Jul 8, 2024

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 14,636 2,332 Updated Oct 23, 2025

MiniMax-AI / MiniMax-M1

MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.

Python 2,921 247 Updated Jul 7, 2025

OneIG-Bench / OneIG-Benchmark

[NeurIPS 2025 DB] OneIG-Bench is a meticulously designed comprehensive benchmark framework for fine-grained evaluation of T2I models across multiple dimensions, including subject-element alignment,…

Python 78 3 Updated Oct 2, 2025

stepfun-ai / Step1X-Edit

A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.

Python 1,687 77 Updated Sep 8, 2025

TencentQQGYLab / ELLA

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

Python 1,261 64 Updated Jul 17, 2024

PKU-YuanGroup / WISE

WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation

Python 159 4 Updated Sep 28, 2025

djghosh13 / geneval

GenEval: An object-focused framework for evaluating text-to-image alignment

HTML 375 26 Updated Mar 3, 2025

PKU-YuanGroup / UniWorld

UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Python 718 22 Updated Oct 23, 2025

tinnerhrhe / EvoSearch-codes

An official implementation of EvoSearch: Scaling Image and Video Generation via Test-Time Evolutionary Search

Python 96 5 Updated Oct 3, 2025

gnobitab / InstaFlow

⚡ InstaFlow! One-Step Stable Diffusion with Rectified Flow (ICLR 2024)

Python 1,372 46 Updated Jun 7, 2024

gnobitab / RectifiedFlow

Official Implementation of Rectified Flow (ICLR2023 Spotlight)

Python 1,442 81 Updated Jul 20, 2024

GAIR-NLP / thinking-with-generated-images

Doodling our way to AGI ✏️ 🖼️ 🧠

Python 109 3 Updated May 29, 2025

ByteDance-Seed / Bagel

Open-source unified multimodal model

Python 5,194 449 Updated Aug 22, 2025

MiniMax-AI / One-RL-to-See-Them-All

The official repo of One RL to See Them All: Visual Triple Unified Reinforcement Learning

Python 319 15 Updated May 31, 2025

AIDC-AI / Awesome-Unified-Multimodal-Models

Awesome Unified Multimodal Models

814 24 Updated Aug 17, 2025

tisfeng / Easydict

一个简洁优雅的词典翻译 macOS App。开箱即用，支持离线 OCR 识别，支持有道词典，🍎 苹果系统词典，🍎 苹果系统翻译，OpenAI，Gemini，DeepL，Google，Bing，腾讯，百度，阿里，小牛，彩云和火山翻译。A concise and elegant Dictionary and Translator macOS App for looking up words an…

Objective-C 10,750 548 Updated Oct 19, 2025