Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View NROwind's full-sized avatar

Block or report NROwind

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Statistical Learning course in USTC. 中科大统计学习(刘东)课程复习资料。

TeX 63 10 Updated Jan 9, 2024

Collection of papers about video-audio understanding

22 1 Updated Dec 26, 2025

Official implementation of RLFR: Extending Reinforcement Learning for LLMs with Flow Environment

Python 46 1 Updated Nov 15, 2025

A Comprehensive Dataset for Advanced Image Generation and Editing}

31 2 Updated Oct 2, 2025
Python 1,754 107 Updated Sep 30, 2025

Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning

Python 135 7 Updated Jun 30, 2025
Python 1,099 68 Updated Nov 20, 2025

Latest open-source "Thinking with images" (O3/O4-mini) papers, covering training-free, SFT-based, and RL-enhanced methods for "fine-grained visual understanding".

110 2 Updated Aug 21, 2025

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

1,282 40 Updated Dec 23, 2025
Python 23 1 Updated Oct 16, 2025

New generation of CLIP with fine grained discrimination capability, ICML2025

Python 541 31 Updated Oct 27, 2025

LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning

Python 75 3 Updated May 23, 2025

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Python 822 49 Updated Jun 16, 2025

Official implementation of BLIP3o-Series

Python 1,626 77 Updated Nov 29, 2025

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,532 60 Updated Jun 14, 2025

[ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

Python 185 5 Updated May 21, 2025
Python 2,496 241 Updated Jul 16, 2025

This is a repo to track the latest autoregressive visual generation papers.

430 5 Updated Jun 25, 2025

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

Python 1,194 80 Updated Nov 25, 2025

GPT-ImgEval: Evaluating GPT-4o’s state-of-the-art image generation capabilities

Python 305 8 Updated May 3, 2025

Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"

Jupyter Notebook 305 11 Updated Sep 28, 2025

A collection of awesome text-to-image generation studies.

TeX 743 39 Updated Dec 25, 2025

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

2,422 206 Updated Jan 22, 2026

✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models

Python 42 4 Updated Apr 10, 2025

Align Anything: Training All-modality Model with Feedback

Python 4,625 509 Updated Nov 27, 2025

📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.

348 15 Updated Jan 8, 2026

[ICCV 2025] VisRL: Intention-Driven Visual Perception via Reinforced Reasoning

Python 43 3 Updated Nov 8, 2025

A collection of multimodal reasoning papers, codes, datasets, benchmarks and resources.

560 52 Updated Dec 15, 2025

Paper List of Inference/Test Time Scaling/Computing

Python 345 9 Updated Aug 28, 2025
Next