Stars
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…
A TTS that fits in your CPU (and pocket)
一个功能强大的命令行工具,用于在 Windows 和 Linux 平台上发现、转换、安装和管理鼠标指针主题。它支持双向转换,可将 Linux 光标主题 (XCursor) 转为 Windows 格式 (.cur/.ani),亦可将 Windows 主题转为 Linux 格式,并提供安装、应用和卸载鼠标主题的全套管理功能。
GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image Generation.
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…
Offical Implementation of SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
ComfyUI DyPE, enabling artifact-free 4K+ image generation for Qwen, Flux + Nunchacku
An Open Phone Agent Model & Framework. Unlocking the AI Phone for Everyone
GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters
Load and run SDNQ quantized models in ComfyUI with 50-75% VRAM savings!
A ComfyUI node that adds random noise to text embeddings.
Glance: Accelerating Diffusion Models with 1 Sample
Implementation of FlashAttention-2 for Nvidia Tesla V100
High quality training free inpaint for every stable diffusion model. Supports ComfyUI
Transform AI image generation from random exploration into deliberate artistic navigation. This advanced KSampler replacement blends traditional noise with shader noise. Navigate latent space with …
Ovis-Image is a 7B text-to-image model specifically optimized for high-quality text rendering, designed to operate efficiently under stringent computational constraints.
A lightweight JSON-style prompt builder for FLUX 2, with camera, lens, lighting, and style presets.
The ultimate training toolkit for finetuning diffusion models
Official repository for “DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation”