Thanks to visit codestin.com
Credit goes to GitHub.com

Skip to content
View lucadellalib's full-sized avatar

Block or report lucadellalib

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency

Python 181 22 Updated Oct 26, 2025

The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"

Python 185 16 Updated Sep 24, 2025

A method that directly addresses the modality gap by aligning speech token with the corresponding text transcription during the tokenization stage.

Python 107 11 Updated Sep 3, 2025

Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequences of interleaved semantic and acoustic tokens.

Python 28 2 Updated Sep 20, 2025

jax port of snac

Python 11 1 Updated May 12, 2024

Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding

Python 19 3 Updated May 5, 2025
Python 45 6 Updated Jul 7, 2025

Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 346 49 Updated Jul 21, 2025

MTLA: Multi-head Temporal Latent Attention

Python 760 35 Updated Oct 6, 2025

[NAACL 2025] WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching

Python 120 11 Updated Mar 27, 2025

Awesome speech/audio LLMs, representation learning, and codec models

1,202 74 Updated Aug 13, 2025
Python 35 1 Updated Sep 24, 2024

Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"

Python 212 17 Updated Sep 19, 2024

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 10,289 760 Updated Jan 10, 2026

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1,256 108 Updated Mar 2, 2025

Faster Whisper transcription with CTranslate2

Python 20,536 1,702 Updated Nov 19, 2025

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 13,987 2,062 Updated Jan 22, 2026

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 9,364 847 Updated Jan 19, 2026

Vector (and Scalar) Quantization, in Pytorch

Python 3,833 313 Updated Jan 13, 2026

ICASSP 2024 - Generative De-Quantization for Neural Speech Codec via Latent Diffusion.

Python 55 3 Updated Nov 16, 2025

Voice Conversion With Just Nearest Neighbors

Python 509 74 Updated Jan 16, 2026

[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Python 3,773 271 Updated Feb 13, 2025

Mamba SSM architecture

Python 17,023 1,568 Updated Jan 12, 2026

Continual Learning papers list, curated by ContinualAI

HTML 687 58 Updated Apr 22, 2024

Pytorch implementation of Simplified Structured State-Spaces for Sequence Modeling (S5)

Python 82 3 Updated Apr 26, 2024

Convert Machine Learning Code Between Frameworks

Python 14,222 5,561 Updated Oct 17, 2025

End-to-End Speech Processing Toolkit

Python 9,699 2,374 Updated Jan 21, 2026

Partially Observable Process Gym

Python 211 17 Updated Jun 12, 2025

An elegant PyTorch deep reinforcement learning library.

Python 9,754 1,239 Updated Dec 1, 2025

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)

Python 11,165 1,252 Updated Jan 20, 2026
Next