Stars
A paper list for spatial reasoning
Unified KV Cache Compression Methods for Auto-Regressive Models
A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts Interested in World Modeling.
Official repo for paper "Video-As-Prompt: Unified Semantic Control for Video Generation"
Official Implementations for Paper - HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives
[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
[ArXiv 25] Stable Video Infinity: Infinite-Length Video Generation with Error Recycling
Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment
Lynx: Towards High-Fidelity Personalized Video Generation
Official implementation of HPSv3: Towards Wide-Spectrum Human Preference Score (ICCV2025)
Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch
๐ 3D and 4D World Modeling: A Survey
Collection of extracted System Prompts from popular chatbots like ChatGPT, Claude & Gemini
A Comprehensive Benchmark Suite for AI Story Visualization
๐ฅ [ICCV 2025 Highlight] InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
Stand-In is a lightweight, plug-and-play framework for identity-preserving video generation.
[NeurIPS 2025] Official implementation of "XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation".
[SIGGRAPH Asia 2025] DreamO: A Unified Framework for Image Customization
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels with Hunyuan3D World Model
Official inference code and LongText-Bench benchmark for our paper X-Omni (https://arxiv.org/pdf/2507.22058).
Wan: Open and Advanced Large-Scale Video Generative Models
A unified inference and post-training framework for accelerated video generation.
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)
Official implementation of ICCV 2025 paper - CharaConsist: Fine-Grained Consistent Character Generation
Official implementation of Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
[Arxiv'25] MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization