Stars
GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
Schedule-Free Optimization in PyTorch
A lightweight audio codec based on a single quantizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
A TTS model capable of generating ultra-realistic dialogue in one pass.
The official implementation of TokenSynth (ICASSP 2025)
Official inference code for NAACL 2024 paper "R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces"
Elucidating the Design Space of Diffusion-Based Generative Models (EDM)
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
GRAFX: An Open-Source Library for Audio Processing Graphs in PyTorch
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
Official implemention for Diffusion Models Are Innate One-Step Generators
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
PyTorch implementation of the Perceptual Evaluation of Speech Quality for wideband audio
Conformer-based Metric GAN for speech enhancement
This is the official implementation of the SEMamba paper. (Accepted to IEEE SLT 2024)
Official code for the CVPR 2025 paper "SemanticDraw: Towards Real-Time Interactive Content Creation from Image Diffusion Models."