MlWoo

MlWoo MlWoo

In the plateau，TTS

127 followers · 52 following

Beijing

Stars

MiniMax-AI / VTP

Towards Scalable Pre-training of Visual Tokenizers for Generation

Python 426 10 Updated Dec 16, 2025

Stability-AI / stable-audio-metrics

Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.

Python 279 23 Updated Jan 20, 2026

sarulab-speech / UTMOSv2

UTokyo-SaruLab MOS Prediction System

Python 286 28 Updated Dec 18, 2025

haoheliu / SemantiCodec-inference

Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.

Python 242 21 Updated Mar 7, 2025

sanowl / Multimodal-Latent-Language-Modeling-with-Next-Token-Diffusion

Python 11 1 Updated Dec 13, 2024

KinWaiCheuk / nnAudio

Audio processing by using pytorch 1D convolution network

Python 1,115 97 Updated Dec 7, 2025

moatifbutt / awesome-diffusion-iclr-2025

List of diffusion related active submissions on OpenReview for ICLR 2025.

52 1 Updated Oct 27, 2024

scraed / CharacteristicGuidanceWebUI

Provide large guidance scale correction for Stable Diffusion web UI (AUTOMATIC1111), implementing the paper "Characteristic Guidance: Non-linear Correction for Diffusion Model at Large Guidance Scale"

Python 86 6 Updated Mar 2, 2025

ASLP-lab / SongEval

A song aesthetic evaluation toolkit trained on SongEval.

Python 273 23 Updated Jun 15, 2025

jianfch / stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper

Python 2,144 227 Updated Oct 29, 2025

Eps-Acoustic-Revolution-Lab / EAR_VAE

This is the official implementation for εar-VAE model including inference and evaluation parts, more details coming soon...

Python 54 6 Updated Jan 18, 2026

schmiph2 / pysepm

Python implementation of performance metrics in Loizou's Speech Enhancement book

Python 445 91 Updated Feb 15, 2025

vipshop / cache-dit

🤗 A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.

Python 921 55 Updated Jan 23, 2026

nari-labs / dia2

TTS model capable of streaming conversational audio in realtime.

Python 1,025 83 Updated Nov 29, 2025

KdaiP / DC-Speech-VAE

5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs

Python 57 9 Updated Nov 19, 2025

dc-ai-projects / DC-Gen

DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space

Python 338 9 Updated Oct 5, 2025

Mddct / SimDiVeQ

Python 4 Updated Nov 25, 2025

sansan0 / TrendRadar

⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载，你的 AI 舆情监控助手与热点筛选工具！聚合多平台热点 + RSS 订阅，支持关键词精准筛选。AI 翻译 + AI 分析简报直推手机，也支持接入 MCP 架构，赋能 AI 自然语言对…

Python 44,328 21,800 Updated Jan 23, 2026

spotify / pedalboard

🎛 🔊 A Python library for audio.

C++ 5,941 313 Updated Jan 16, 2026

stepfun-ai / Step-Audio-EditX

A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech

Python 830 55 Updated Jan 23, 2026

facebookresearch / omnilingual-asr

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,610 227 Updated Dec 30, 2025

MissingCore / Music

A Nothing inspired local music player.

TypeScript 525 39 Updated Jan 23, 2026

Soul-AILab / SoulX-Podcast

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 3,085 394 Updated Dec 11, 2025

alibaba / unified-audio

An Open-Source Project to Unify Audio Processing and Generation

Python 169 12 Updated Dec 25, 2025

kyutai-labs / nanoGPTaudio

Forked from karpathy/nanoGPT

Code for the blog "Neural audio codecs: how to get audio into LLMs"

Python 147 4 Updated Oct 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly