Stars
Towards Scalable Pre-training of Visual Tokenizers for Generation
Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
Audio processing by using pytorch 1D convolution network
List of diffusion related active submissions on OpenReview for ICLR 2025.
Provide large guidance scale correction for Stable Diffusion web UI (AUTOMATIC1111), implementing the paper "Characteristic Guidance: Non-linear Correction for Diffusion Model at Large Guidance Scale"
A song aesthetic evaluation toolkit trained on SongEval.
Transcription, forced alignment, and audio indexing with OpenAI's Whisper
This is the official implementation for εar-VAE model including inference and evaluation parts, more details coming soon...
Python implementation of performance metrics in Loizou's Speech Enhancement book
🤗 A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.
TTS model capable of streaming conversational audio in realtime.
5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs
DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space
⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对…
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.
An Open-Source Project to Unify Audio Processing and Generation
kyutai-labs / nanoGPTaudio
Forked from karpathy/nanoGPTCode for the blog "Neural audio codecs: how to get audio into LLMs"
A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.