-
Peking University
- WuHan, China
- https://blog.jongkhu.com/
Stars
🔥 OneThinker: All-in-one Reasoning Model for Image and Video
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
Learning Plug-and-play Memory for Guiding Video Diffusion Models
Official code for VIST3A: Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator
[NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation
Universal Image Restoration Pre-training via Masked Degradation Classification
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.
[ICLR 2025 Oral] Official code for "LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias"
[ICLR'25 Oral] No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images
Official implementation for "RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers" (ICML 2025) and "UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers"
[ICCV'25 Best Paper Finalist] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Official Implementation of paper "MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion"
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation".
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
Lumos Project: Frontier video unified model research by Alibaba DAMO Academy.
A list of papers about concept bottleneck models (CBMs)
[NeurIPS 2025] Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generation
An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerful framework.
CVPR 2024-Improved Implicit Neural Representation with Fourier Reparameterized Training
[CVPR 2025] Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation
(NIPS 2025) OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-Time Self-Aware Emotional Speech Synthesis
[NeurIPS 2025 spotlight] Official implementation for "FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving"
[NeurIPS' 2025] JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent