Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View halspeech's full-sized avatar

Highlights

  • Pro

Block or report halspeech

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official PyTorch implementation of 'Rec-RIR: Monaural Blind Room Impulse Response Identification via DNN-based Reverberant Speech Reconstruction in STFT Domain'

Python 15 2 Updated Sep 26, 2025

DiaRemot2-ON: CPU-only audio intelligence pipeline (Faster-Whisper, ONNX, diarization, paralinguistics)

Python 6 1 Updated Oct 26, 2025

Europeanized CosyVoice2 for French & German

Jupyter Notebook 4 Updated Oct 16, 2025

[ACM MM 2025] AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation

Python 12 1 Updated Oct 4, 2025

A Benchmark for Evaluating Turn-Taking and Overlap Handling in Full-Duplex Spoken Dialogue Models

Python 94 4 Updated Sep 21, 2025

IMS's Audio Deepfake Detection Toolkit

Python 5 Updated Sep 18, 2025

PESQ (Perceptual Evaluation of Speech Quality) Wrapper for Python Users (narrow band and wide band)

C 609 103 Updated Sep 5, 2024

SLM for SER

Python 1 Updated Oct 25, 2025

On-device TTS model by Neuphonic

Python 3,692 355 Updated Oct 27, 2025

A toolkit for benchmarking on a wide variety of audio deepfake datasets.

Python 18 Updated Oct 9, 2025

Text to speech with Hebrew G2P and TTS models based on Piper/Gemma3

Python 1 Updated Oct 4, 2025

T-one is a high-performance streaming ASR pipeline for Russian, specialized for the telephony domain.

Python 207 18 Updated Jul 30, 2025
Python 16 1 Updated Oct 1, 2025

Khmer Tokenizer

PHP 2 Updated May 4, 2025

Frontier Open-Source Text-to-Speech

9,763 1,234 Updated Sep 5, 2025

This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Lan…

Python 185 12 Updated Sep 21, 2025

Official code for paper "Learning to Use Tools via Cooperative and Interactive Agents"

Python 223 2 Updated Mar 26, 2024

Inference code for Interspeech 2025 paper, "LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec"

Python 30 2 Updated Oct 23, 2025

Train transformer language models with reinforcement learning.

Python 16,031 2,256 Updated Oct 27, 2025

Repo of the paper "Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model""

Python 14 Updated Jun 28, 2024
Python 15 Updated Jul 4, 2025

Easy-to-Use Speech MOS predictors

Python 322 18 Updated Oct 24, 2023

PPG-Based Voice Conversion

Python 348 76 Updated Jul 22, 2022

Source code and adittional results of INTERSPEECH 2025 paper 'A Dataset for Automatic Assessment of TTS Quality in Spanish'

Jupyter Notebook 6 Updated May 23, 2025

Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models

Python 163 15 Updated Dec 18, 2023

Align Anything: Training All-modality Model with Feedback

Jupyter Notebook 4,571 505 Updated Aug 25, 2025

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,912 147 Updated Apr 21, 2025

Speech-to-text server framework with next-gen Kaldi

C++ 803 131 Updated Oct 24, 2025
Next