InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →
Top 23 Python Audio Projects
-
Project mention: Ultimate Vocal Remover GUI, a FOSS audio stem splitter | news.ycombinator.com | 2025-05-09
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
-
Star the Speech Brain repository ⭐
-
-
https://github.com/TaylorSMarks/playsound/issues/101
(A workaround exists: downgrading to version 1.2.2, but that comes with its own issues.)
The last time I experimented with audio in Python, I was surprised by how lacking its multimedia libraries are.
For example, when I needed to read audio files as data, I tried `SoundFile`, `librosa` (a wrapper around `SoundFile` or `audioread`), and `pydub`, and none of them was particularly satisfying or has seen much active development lately.
If you need to read various formats, pydub is probably your best bet (it does this by invoking ffmpeg under the hood). I was hoping for a more "native" solution, but oh well. Unfortunately, `pydub` is also unmaintained and has some serious performance issues (for example: https://github.com/jiaaro/pydub/issues/518 )
-
SpeechRecognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
-
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
-
Project mention: Can't pay, won't pay: streaming services are driving viewers back to piracy | news.ycombinator.com | 2025-08-14
I developed a tool (https://github.com/smacke/ffsubsync) which can sync subtitles against each other, and this can be used in conjunction with other tools such as https://pypi.org/project/srt/ to combine multiple subtitle streams into a single stream. I've used this strategy to good effect to get both English and Chinese subtitles up at once.
-
-
pyAudioAnalysis
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
-
Project mention: Show HN: Shoggoth Mini – A weird tentacle robot powered by GPT-4o and RL | news.ycombinator.com | 2025-07-15
> also, "GPT-4o continuously listens to speech through the audio stream," is going to be problematic
Seems like openWakeWord or porcupine could be able to solve by adding a layer for wake word detection before sending the prompt off.
I wonder if latency would be any better with a local model cached in a 16GB or 24GB graphics card. It would have to be a quantized/distilled model, but maybe performance would still be acceptable.
https://github.com/dscripka/openWakeWord
https://github.com/Picovoice/porcupine
-
-
picard
A cross-platform music tagger powered by the MusicBrainz database. Picard organizes your music collection by updating your tags, renaming your files, and sorting them into a folder structure, exactly the way you want it.
If you have the files downloaded, picard is also useful - https://picard.musicbrainz.org/
-
distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
-
Project mention: Benn Jordan's AI poison pill and the weird world of adversarial noise | news.ycombinator.com | 2025-04-15
https://github.com/riffusion/riffusion-hobby
The more advanced music generators out now I believe have more of a 'stems' approach and a larger processing pipeline to increase fidelity and add tracking vocal capability but the underlying idea is the same.
Any adversarial attack to hide information in the spectrograph to fool the model into categorizing the track as something it is not isn't different than the image adversarial attacks which have been found to have ways to be mitigated.
Various forms of filtering for inaudible spectral information coupled with methods that destroy and re-synthesize/randomize phase information would likely break this poisoning attack.
-
Project mention: Show HN: Background noise removal in multimedia with a single command | news.ycombinator.com | 2025-10-06
-
aeneas
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
-
-
-
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Audio discussion
Python Audio related posts
-
Ask HN: What Are You Working On? (Nov 2025
-
Beets: The music geek's media organizer
-
5 must know open-source repositories to build cool AI apps
-
Creating a realtime voice agent using OpenAI's new gpt-realtime speech-to-speech model
-
Show HN: Background noise removal in multimedia with a single command
-
Why I Ditched Spotify, and How I Set Up My Own Music Stack
-
Can't pay, won't pay: streaming services are driving viewers back to piracy
-
A note from our sponsor - InfluxDB
www.influxdata.com | 15 Nov 2025
Index
What are some of the best open-source Audio projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | ultimatevocalremovergui | 22,421 |
| 2 | beets | 14,260 |
| 3 | speechbrain | 10,758 |
| 4 | AudioGPT | 10,200 |
| 5 | pydub | 9,637 |
| 6 | SpeechRecognition | 8,898 |
| 7 | jukebox | 8,014 |
| 8 | librosa | 7,998 |
| 9 | ffsubsync | 7,413 |
| 10 | dejavu | 6,661 |
| 11 | pyAudioAnalysis | 6,166 |
| 12 | Porcupine | 4,487 |
| 13 | basic-pitch | 4,419 |
| 14 | picard | 4,356 |
| 15 | distil-whisper | 3,979 |
| 16 | riffusion-hobby | 3,807 |
| 17 | DeepFilterNet | 3,407 |
| 18 | aeneas | 2,742 |
| 19 | mkchromecast | 2,303 |
| 20 | matchering | 2,297 |
| 21 | Tauon | 2,287 |
| 22 | m3u8 | 2,216 |
| 23 | vocal-remover | 1,717 |