ZrrSkywalker

Renrui Zhang ZrrSkywalker

PhD in MMLab, CUHK, Peking University.

785 followers · 35 following

CUHK MMLab
Hong Kong
https://zrrskywalker.github.io/

Achievements

Stars

ByteDance-Seed / Seed-1.8

Jupyter Notebook 200 3 Updated Dec 19, 2025

Ivan-Tang-3D / 3DGen-R1

The official implementation of The paper "Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation"

Python 94 1 Updated Dec 28, 2025

CaraJ7 / DraCo

Offical Repository for Paper: DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation

16 Updated Dec 7, 2025

tyxiong23 / Multi-Crit

Python 13 1 Updated Dec 10, 2025

ZiyuGuo99 / Thinking-while-Generating

The first Interleaved framework for textual reasoning within the visual generation process

156 1 Updated Nov 21, 2025

ZiyuGuo99 / MME-CoF

Are Video Models Ready as Zero-shot Reasoners?

Python 84 4 Updated Nov 24, 2025

yqi19 / BEAR-official

The official repository of BEAR: Benchmarking and Enhancing Multimodal Language Models with Atomic Embodied Capabilities

26 1 Updated Oct 26, 2025

ByteDance-Seed / AHN

AHN: Artificial Hippocampus Networks for Efficient Long-Context Modeling

Python 163 5 Updated Oct 17, 2025

ULMEvalKit / ULMEvalKit

ULMEvalKit: One-Stop Eval ToolKit for Image Generation

Python 54 2 Updated Dec 17, 2025

rongyaofang / prism-bench

This is the official repository for the paper "FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark"

Python 117 1 Updated Sep 12, 2025

Perceive-Anything / PAM

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

Jupyter Notebook 306 13 Updated Sep 28, 2025

xinyan-cxy / MINT-CoT

[NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning

Python 95 4 Updated Sep 19, 2025

facebookresearch / Multi-SpatialMLLM

Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models

Python 166 8 Updated Oct 10, 2025

shilinyan99 / CrossLMM

CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms

25 Updated Dec 21, 2025

ByteDance-Seed / Seed1.5-VL

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,519 60 Updated Jun 14, 2025

Diffusion-CoT / ReflectionFlow

[ICCV 2025] Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

Python 210 12 Updated Nov 5, 2025

CaraJ7 / T2I-R1

[NeurIPS 2025] T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Python 425 24 Updated Sep 18, 2025

InternScience / TrustGeoGen

Official repository for "TrustGeoGen: Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving"

22 Updated Sep 1, 2025

Alpha-VLLM / Lumina-mGPT-2.0

Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling

Python 1,074 53 Updated Nov 3, 2025

lmzpai / roboMamba

The repo of paper `RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation`

Python 149 12 Updated Dec 22, 2024

PKU-HMI-Lab / Hybrid-VLA

HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model

Python 333 10 Updated Oct 3, 2025

Wan-Video / Wan2.1

Wan: Open and Advanced Large-Scale Video Generative Models

Python 15,133 2,303 Updated Dec 15, 2025

MME-Benchmarks / MME-CoT

MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency

Python 137 6 Updated Aug 5, 2025

ZiyuGuo99 / Image-Generation-CoT

[CVPR 2025] The First Investigation of CoT Reasoning (RL, TTS, Reflection) in Image Generation

Python 850 26 Updated May 23, 2025

InternScience / Chimera

(ICCV-2025 Official Code)) Improving Generalist Model with Domain-Specific Experts

Python 86 11 Updated Oct 29, 2025

PKU-HMI-Lab / LIFT3D

[CVPR 2025]Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation

Python 174 12 Updated Jun 20, 2025

asalarpour / Point_GN

Official WACV 2025 code for Point-GN: A non-parametric, training-free method for 3D point cloud classification using Gaussian Positional Encoding (GPE). No training, no parameters, state-of-the-art…

Python 14 1 Updated Jul 22, 2025

instantX-research / Regional-Prompting-FLUX

Training-free Regional Prompting for Diffusion Transformers 🔥

Python 690 31 Updated Nov 28, 2024

AFeng-x / PixWizard

[ICLR2025] A versatile image-to-image visual assistant, designed for image generation, manipulation, and translation based on free-from user instructions.

Python 210 3 Updated May 5, 2025

CaraJ7 / MMSearch

[ICLR 2025] The First Multimodal Seach Engine Pipeline and Benchmark for LMMs

Python 482 33 Updated Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Renrui Zhang ZrrSkywalker

Achievements

Achievements

Block or report ZrrSkywalker

Stars

ByteDance-Seed / Seed-1.8

Ivan-Tang-3D / 3DGen-R1

CaraJ7 / DraCo

tyxiong23 / Multi-Crit

ZiyuGuo99 / Thinking-while-Generating

ZiyuGuo99 / MME-CoF

yqi19 / BEAR-official

ByteDance-Seed / AHN

ULMEvalKit / ULMEvalKit

rongyaofang / prism-bench

Perceive-Anything / PAM

xinyan-cxy / MINT-CoT

facebookresearch / Multi-SpatialMLLM

shilinyan99 / CrossLMM

ByteDance-Seed / Seed1.5-VL

Diffusion-CoT / ReflectionFlow

CaraJ7 / T2I-R1

InternScience / TrustGeoGen

Alpha-VLLM / Lumina-mGPT-2.0

lmzpai / roboMamba

PKU-HMI-Lab / Hybrid-VLA

Wan-Video / Wan2.1

MME-Benchmarks / MME-CoT

ZiyuGuo99 / Image-Generation-CoT

InternScience / Chimera

PKU-HMI-Lab / LIFT3D

asalarpour / Point_GN

instantX-research / Regional-Prompting-FLUX

AFeng-x / PixWizard

CaraJ7 / MMSearch