Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View MlWoo's full-sized avatar
:electron:
:electron:
  • Beijing

Block or report MlWoo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Towards Scalable Pre-training of Visual Tokenizers for Generation

Python 426 10 Updated Dec 16, 2025

Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.

Python 279 23 Updated Jan 20, 2026

UTokyo-SaruLab MOS Prediction System

Python 286 28 Updated Dec 18, 2025

Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.

Python 242 21 Updated Mar 7, 2025
Python 18 Updated Jan 3, 2026

Audio processing by using pytorch 1D convolution network

Python 1,115 97 Updated Dec 7, 2025
Python 36 4 Updated Sep 6, 2025

List of diffusion related active submissions on OpenReview for ICLR 2025.

52 1 Updated Oct 27, 2024

Provide large guidance scale correction for Stable Diffusion web UI (AUTOMATIC1111), implementing the paper "Characteristic Guidance: Non-linear Correction for Diffusion Model at Large Guidance Scale"

Python 86 6 Updated Mar 2, 2025

A song aesthetic evaluation toolkit trained on SongEval.

Python 273 23 Updated Jun 15, 2025

Transcription, forced alignment, and audio indexing with OpenAI's Whisper

Python 2,144 227 Updated Oct 29, 2025

This is the official implementation for εar-VAE model including inference and evaluation parts, more details coming soon...

Python 54 6 Updated Jan 18, 2026

Python implementation of performance metrics in Loizou's Speech Enhancement book

Python 445 91 Updated Feb 15, 2025

🤗 A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.

Python 921 55 Updated Jan 23, 2026

TTS model capable of streaming conversational audio in realtime.

Python 1,025 83 Updated Nov 29, 2025

5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs

Python 57 9 Updated Nov 19, 2025

DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space

Python 338 9 Updated Oct 5, 2025
Python 4 Updated Nov 25, 2025

⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对…

Python 44,328 21,800 Updated Jan 23, 2026

🎛 🔊 A Python library for audio.

C++ 5,941 313 Updated Jan 16, 2026

A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech

Python 830 55 Updated Jan 23, 2026

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,610 227 Updated Dec 30, 2025

A Nothing inspired local music player.

TypeScript 525 39 Updated Jan 23, 2026

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 3,085 394 Updated Dec 11, 2025

An Open-Source Project to Unify Audio Processing and Generation

Python 169 12 Updated Dec 25, 2025

Code for the blog "Neural audio codecs: how to get audio into LLMs"

Python 147 4 Updated Oct 20, 2025
Python 4 Updated Oct 22, 2025

A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR

Python 1,034 172 Updated Jul 5, 2023

SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.

Python 113 13 Updated Dec 22, 2024
Next