Stars
Some comprehensive papers about speaker diarization
✨✨[NeurIPS 2025] VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
High-Resolution Image Synthesis with Latent Diffusion Models
context labels and pronunciation data for JSUT corpus
Deep learning based Speech Beamforming
Multilingual G2P in 100 languages
Library to build speech synthesis systems designed for easy and fast prototyping.
speech self-supervised representations
implementation of music transformer with pytorch (ICLR2019)
An implementation of WaveNet with fast generation
Global Rhythm Style Transfer Without Text Transcriptions
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
CMU Wilderness Multilingual Speech Dataset