Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View lucadellalib's full-sized avatar

Block or report lucadellalib

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A method that directly addresses the modality gap by aligning speech token with the corresponding text transcription during the tokenization stage.

Python 96 10 Updated Sep 3, 2025

Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequences of interleaved semantic and acoustic tokens.

Python 25 1 Updated Sep 20, 2025

jax port of snac

Python 11 1 Updated May 12, 2024

Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding

Python 19 3 Updated May 5, 2025
Python 42 6 Updated Jul 7, 2025

Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 321 46 Updated Jul 21, 2025

MTLA: Multi-head Temporal Latent Attention

Python 758 35 Updated Oct 6, 2025

[NAACL 2025] WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching

Python 111 9 Updated Mar 27, 2025

Awesome speech/audio LLMs, representation learning, and codec models

1,161 71 Updated Aug 13, 2025
Python 33 1 Updated Sep 24, 2024

Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"

Python 198 17 Updated Sep 19, 2024

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 10,027 727 Updated Oct 17, 2025

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1,219 104 Updated Mar 2, 2025

Faster Whisper transcription with CTranslate2

Python 18,731 1,552 Updated Oct 22, 2025

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 13,477 1,975 Updated Oct 24, 2025

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 9,028 816 Updated Oct 15, 2025

Vector (and Scalar) Quantization, in Pytorch

Python 3,638 295 Updated Oct 20, 2025

ICASSP 2024 - Generative De-Quantization for Neural Speech Codec via Latent Diffusion.

Python 55 3 Updated Oct 17, 2025

Voice Conversion With Just Nearest Neighbors

Python 502 70 Updated Mar 18, 2024

[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Python 3,632 254 Updated Feb 13, 2025

Mamba SSM architecture

Python 16,197 1,474 Updated Oct 10, 2025

Continual Learning papers list, curated by ContinualAI

HTML 666 56 Updated Apr 22, 2024

Pytorch implementation of Simplified Structured State-Spaces for Sequence Modeling (S5)

Python 79 3 Updated Apr 26, 2024

Convert Machine Learning Code Between Frameworks

Python 14,239 5,593 Updated Oct 17, 2025

End-to-End Speech Processing Toolkit

Python 9,533 2,336 Updated Oct 24, 2025

Partially Observable Process Gym

Python 202 16 Updated Jun 12, 2025

An elegant PyTorch deep reinforcement learning library.

Python 8,877 1,186 Updated Oct 25, 2025

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)

Python 10,466 1,164 Updated Oct 22, 2025

A toolkit for tinyML research and deployment

Python 72 16 Updated Sep 18, 2024

Structured state space sequence models

Jupyter Notebook 2,755 341 Updated Jul 17, 2024
Next