Stars
Official inference code and LongText-Bench benchmark for our paper X-Omni (https://arxiv.org/pdf/2507.22058).
Official PyTorch implementation for "Large Language Diffusion Models"
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
NextFlow🚀: Unified Sequential Modeling Activates Multimodal Understanding and Generation
UniVideo: Unified Understanding, Generation, and Editing for Videos
Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.
Official codes for the paper "GARDO: Reinforcing Diffusion Models without Reward Hacking"
DreamID-V: Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer
Scalable and memory-optimized training of diffusion models
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
Official code for StoryMem: Multi-shot Long Video Storytelling with Memory
Towards Real-Time Diffusion-Based Streaming Video Super-Resolution — An efficient one-step diffusion framework for streaming VSR with locality-constrained sparse attention and a tiny conditional de…
Implementation of "S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models"
HY-World 1.5: A Systematic Framework for Interactive World Modeling with Real-Time Latency and Geometric Consistency
Orient Anything V2, NeurIPS 2025 Spotlight
Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset. [ACCV'20]
Official Implementation of "MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives"
基于AI的图片/视频硬字幕去除、文本水印去除,无损分辨率生成去字幕、去水印后的图片/视频文件。无需申请第三方API,本地实现。AI-based tool for removing hard-coded subtitles and text-like watermarks from videos or Pictures.
The official repository of "Astra : General Interactive World Model with Autoregressive Denoising"
Mixture-of-Groups Attention for End-to-End Long Video Generation
"ViMax: Agentic Video Generation (Director, Screenwriter, Producer, and Video Generator All-in-One)"
LongLive: Real-time Interactive Long Video Generation
Official implementation of "MV-TAP: Tracking Any Point in Multi-View Videos"
视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation