Highlights
Stars
ComfyUI nodes for WanAnimate model input preprocessing
Build smaller, faster, and more secure desktop and mobile applications with a web frontend.
🎥 Python and OpenCV-based scene cut/transition detection program & library.
DECA: Detailed Expression Capture and Animation (SIGGRAPH 2021)
香蕉超市|各种玩法一键生成,无需提示词,支持局部涂选、连续编辑
The most advanced open-source browser fingerprinting library
Unlimited-length talking video generation that supports image-to-video and video-to-video generation
Streamlining Cartoon Production with Generative Post-Keyframing
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Reference PyTorch implementation and models for DINOv3
FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers
A set of beautifully-designed, accessible components and a code distribution platform. Works with your favorite frameworks. Open Source. Open Code.
SwarmUI (formerly StableSwarmUI), A Modular Stable Diffusion Web-User-Interface, with an emphasis on making powertools easily accessible, high performance, and extensibility.
A unified inference and post-training framework for accelerated video generation.
Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels with Hunyuan3D World Model
🚀 EvoAgentX: Building a Self-Evolving Ecosystem of AI Agents
Text-audio foundation model from Boson AI
[NeurIPS 2025] OmniSVG is the first family of end-to-end multimodal SVG generators that leverage pre-trained Vision-Language Models (VLMs), capable of generating complex and detailed SVGs, from sim…
A ComfyUI custom node designed for advanced image background removal and object, face, clothes, and fashion segmentation, utilizing multiple models including RMBG-2.0, INSPYRENET, BEN, BEN2, BiRefN…
Pusa: Thousands Timesteps Video Diffusion Model
MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech…
ReSwapper aims to reproduce the implementation of inswapper. This repository provides code for training, inference, and includes pretrained weights.
A general fine-tuning kit geared toward diffusion models.
Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.