Lists (14)
Sort Name ascending (A-Z)
Stars
The code for PixelRefer & VideoRefer
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights
[ICCV 2025] Object-centric Video Question Answering with Visual Grounding and Referring
This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.
[CVPR 2025 Highlight] VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star 🌟 if you find it useful.
[ArXiv 2025] A survey about controllable video generation: This repo is the official awesome of "Controllable video generation: A survey"
[CVPR 2025] Official Implementation of MotionPro: A Precise Motion Controller for Image-to-Video Generation
[CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis
Translate Unreal Engine Blueprints to C++ in seconds. Not hours.
Lumina-DiMOO - An Open-Sourced Multi-Modal Large Diffusion Language Model
LBM: Latent Bridge Matching for Fast Image-to-Image Translation ✨ (ICCV 2025 Highlight)
The official GitHub repository of the paper "Recent advances in large langauge model benchmarks against data contamination: From static to dynamic evaluation"
[TMLR 2025] Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
[ACM MM 2025] ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models
ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback
Large World Model -- Modeling Text and Video with Millions Context
FlexTok: Resampling Images into 1D Token Sequences of Flexible Length
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."
A scalable, end-to-end training pipeline for general-purpose agents
Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
[CVPR‘ 2025 ] JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration
Official Implementation of Paper Transfer between Modalities with MetaQueries