Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View ZrrSkywalker's full-sized avatar

Block or report ZrrSkywalker

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Jupyter Notebook 200 3 Updated Dec 19, 2025

The official implementation of The paper "Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation"

Python 94 1 Updated Dec 28, 2025

Offical Repository for Paper: DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation

16 Updated Dec 7, 2025
Python 13 1 Updated Dec 10, 2025

The first Interleaved framework for textual reasoning within the visual generation process

156 1 Updated Nov 21, 2025

Are Video Models Ready as Zero-shot Reasoners?

Python 84 4 Updated Nov 24, 2025

The official repository of BEAR: Benchmarking and Enhancing Multimodal Language Models with Atomic Embodied Capabilities

26 1 Updated Oct 26, 2025

AHN: Artificial Hippocampus Networks for Efficient Long-Context Modeling

Python 163 5 Updated Oct 17, 2025

ULMEvalKit: One-Stop Eval ToolKit for Image Generation

Python 54 2 Updated Dec 17, 2025

This is the official repository for the paper "FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark"

Python 117 1 Updated Sep 12, 2025

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

Jupyter Notebook 306 13 Updated Sep 28, 2025

[NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning

Python 95 4 Updated Sep 19, 2025

Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models

Python 166 8 Updated Oct 10, 2025

CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms

25 Updated Dec 21, 2025

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,519 60 Updated Jun 14, 2025

[ICCV 2025] Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

Python 210 12 Updated Nov 5, 2025

[NeurIPS 2025] T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Python 425 24 Updated Sep 18, 2025

Official repository for "TrustGeoGen: Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving"

22 Updated Sep 1, 2025

Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling

Python 1,074 53 Updated Nov 3, 2025

The repo of paper `RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation`

Python 149 12 Updated Dec 22, 2024

HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model

Python 333 10 Updated Oct 3, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 15,133 2,303 Updated Dec 15, 2025

MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency

Python 137 6 Updated Aug 5, 2025

[CVPR 2025] The First Investigation of CoT Reasoning (RL, TTS, Reflection) in Image Generation

Python 850 26 Updated May 23, 2025

(ICCV-2025 Official Code)) Improving Generalist Model with Domain-Specific Experts

Python 86 11 Updated Oct 29, 2025

[CVPR 2025]Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation

Python 174 12 Updated Jun 20, 2025

Official WACV 2025 code for Point-GN: A non-parametric, training-free method for 3D point cloud classification using Gaussian Positional Encoding (GPE). No training, no parameters, state-of-the-art…

Python 14 1 Updated Jul 22, 2025

Training-free Regional Prompting for Diffusion Transformers 🔥

Python 690 31 Updated Nov 28, 2024

[ICLR2025] A versatile image-to-image visual assistant, designed for image generation, manipulation, and translation based on free-from user instructions.

Python 210 3 Updated May 5, 2025

[ICLR 2025] The First Multimodal Seach Engine Pipeline and Benchmark for LMMs

Python 482 33 Updated Jan 23, 2025
Next