-
Central South University
- Central South University
Lists (10)
Sort Name ascending (A-Z)
bboxing
dataset
GRPO-optimization
latent-image-reasoning
This list includes codes that relates to incorporating latent image tokens in output for better multimodal reasoninglatent-reasoning
perception
perception-and-reasoning
reasoning
Stars
[ICLR'25] Reconstructive Visual Instruction Tuning
This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reas…
Official repository for “Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space”
Training Large Language Model to Reason in a Continuous Latent Space
The official implement of "Grounded Chain-of-Thought for Multimodal Large Language Models"
[NeurIPS 2025] The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning".
[NeurIPS 2025🔥]Main source code of SRPO framework.
[NeurIPS 2025] Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
The official implementation of "Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs"
ProReason: Multi-Modal Proactive Reasoning with Decoupled Eyesight and Wisdom
code for "CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models"
Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
Official codebase for the paper Latent Visual Reasoning
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
An official implementation of "SIM-CoT: Supervised Implicit Chain-of-Thought"
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
MCOUT: Multimodal Chain of Continuous Thought for Latent Reasoning
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)
🚀 LLM-I: Transform LLMs into natural interleaved multimodal creators! ✨ Tool-use framework supporting image search, generation, code execution & editing
[EMNLP 2025] LightThinker: Thinking Step-by-Step Compression
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Official repo for "PAPO: Perception-Aware Policy Optimization for Multimodal Reasoning"
The offical repo for "Parallel-R1: Towards Parallel Thinking via Reinforcement Learning"