Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View Hertin's full-sized avatar

Highlights

  • Pro

Block or report Hertin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Some comprehensive papers about speaker diarization

324 10 Updated May 22, 2025

✨✨[NeurIPS 2025] VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

Python 667 60 Updated May 24, 2025

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python 474 64 Updated Dec 18, 2025

High-Resolution Image Synthesis with Latent Diffusion Models

Jupyter Notebook 13,691 1,703 Updated Feb 29, 2024

context labels and pronunciation data for JSUT corpus

77 12 Updated Sep 2, 2021

HTS-style full-context labels for JSUT v1.1

50 2 Updated Apr 16, 2021

Deep learning based Speech Beamforming

Jupyter Notebook 64 18 Updated Mar 29, 2018

Multilingual G2P in 100 languages

Jupyter Notebook 369 29 Updated May 26, 2023

Library to build speech synthesis systems designed for easy and fast prototyping.

Python 399 71 Updated Jun 29, 2024

speech self-supervised representations

Python 514 39 Updated Apr 27, 2023

implementation of music transformer with pytorch (ICLR2019)

Python 286 54 Updated Nov 22, 2022

An implementation of WaveNet with fast generation

Jupyter Notebook 1,020 232 Updated Sep 17, 2020

Global Rhythm Style Transfer Without Text Transcriptions

Python 284 38 Updated Oct 23, 2024

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

1,375 148 Updated Jun 6, 2024

CMU Wilderness Multilingual Speech Dataset

Shell 288 53 Updated Apr 20, 2019