Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
-
Updated
Aug 3, 2024
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
Very fast, accurate speaker diarization
A Collection of no cost ai websites with models such as Claude 4 sonnet/opus, Grok 4, o3 Pro, Gemini 2.5 Pro for free & much more...
Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities"
text-to-audio-latent-diffusion
Code to train a custom time-domain autoencoder to dereverb audio
A deep learning-based Speech Emotion Recognition (SER) model trained primarily on Indian languages. Designed for applications in call centers, sentiment analysis, and accessibility tools.
Guide to deploying neural networks in VST plugins, with a specific focus on embedded devices using the Elk Audio OS
Safe, production-ready starter for voice cloning via SV2TTS (RTVC wrapper). CLI, tests, Docker, CI, pre-commit. No model weights included.
🗣️ Audio AI: Your Audio & Video Transcription Powerhouse!
Whether it’s text or a link, it can be turned into a podcast!
PodcastAgent uses advanced text-to-speech technology to create natural-sounding multi-speaker podcasts from any written content.
🎧 Navigate audio content effortlessly with Zanshin, a media player that enhances your listening experience by speaker, supporting both YouTube and local files.
⚡ Accelerate speaker diarization with Senko, processing 1 hour of audio in just 5 seconds on powerful hardware—boost your audio analysis efficiency.
Add a description, image, and links to the audio-ai topic page so that developers can more easily learn about it.
To associate your repository with the audio-ai topic, visit your repo's landing page and select "manage topics."