LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
- 
            Updated
            May 19, 2025 
- Python
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
A Survey of Spoken Dialogue Models (60 pages)
[ICCV 2025] Official Implementation for "Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition"
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on One GPU in a Day"
Code for DeSTA2.5-Audio
Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"
A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.
Streamable Text-to-Speech model using a language modeling approach, without vector quantization
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
Survey of audio language models
The official code for the SALMon🍣 benchmark (ICASSP 2025 - Oral)
Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"
A collections of audio codecs with a standardized API
Speech Resynthesis and Language Modeling
a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.
[CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
Add a description, image, and links to the speech-language-model topic page so that developers can more easily learn about it.
To associate your repository with the speech-language-model topic, visit your repo's landing page and select "manage topics."