[NeurIPS 2024] GenRL: Multimodal-foundation world models enable grounding language and video prompts into embodied domains, by turning them into sequences of latent world model states. Latent state…

Python 86 4 Updated Apr 4, 2025

kwsong0113 / diffusion-forcing-transformer

[ICML 2025] Official PyTorch Implementation of "History-Guided Video Diffusion"

Python 596 30 Updated Jul 1, 2025

Parskatt / RoMaV2

Python 398 25 Updated Dec 4, 2025

mit-han-lab / Quest

[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Cuda 364 38 Updated Jul 10, 2025

wang-kevin3290 / scaling-crl

Python 245 18 Updated Nov 28, 2025

MoonshotAI / Kimi-Audio

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 4,448 325 Updated Jun 21, 2025

ByteDance-Seed / Depth-Anything-3

Depth Anything 3

Python 4,010 356 Updated Dec 12, 2025

FoundationVision / InfinityStar

[NeurIPS 2025 Oral]Infinity⭐️: Uniﬁed Spacetime AutoRegressive Modeling for Visual Generation

Python 692 25 Updated Nov 27, 2025

chenfengxu714 / StreamDiffusionV2

StreamDiffusion, Live Stream APP

Python 316 28 Updated Dec 25, 2025

ByteDance-Seed / VeOmni

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,541 128 Updated Jan 16, 2026

deepseek-ai / DeepSeek-OCR

Contexts Optical Compression

Python 22,053 2,007 Updated Oct 25, 2025

ShoufaChen / PixelFlow

Pixel-Space Generative Models

Python 291 14 Updated May 11, 2025

baaivision / Emu3.5

Native Multimodal Models are World Learners

Python 1,401 53 Updated Dec 30, 2025

Lakonik / piFlow

pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation

Python 251 10 Updated Jan 15, 2026

adobe-research / EditVerse

Official repo for paper "EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning"

Python 124 4 Updated Oct 9, 2025

yihao-meng / HoloCine

Official Implementations for Paper - HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives

Python 592 112 Updated Nov 26, 2025