SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Speech Projects
-
Project mention: 2025 Voice AI Guide: How to Make Your Own Real-Time Voice Agent (Part-1) | dev.to | 2025-09-20
XTTS-v2 — Zero-shot voice cloning, 17 languages, streaming support
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
MockingBird
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
-
datasets
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
Hugging Face Datasets -- the library that lets you download and manage datasets from the Hugging Face Hub, as well as being a convenient vendor-neutral interface for your own datasets.
-
2.3X speed improvement over WhisperX and a 3X speed boost compared to HuggingFace Pipeline with FlashAttention 2 (Insanely Fast Whisper)
-
-
-
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
Project mention: 2025 Voice AI Guide: How to Make Your Own Real-Time Voice Agent (Part-1) | dev.to | 2025-09-20
Silero VAD is the gold standard and pipecat has builtin support so I have choosen that :
-
Project mention: I Open-Sourced My AI Toy Company That Runs on ESP32 and OpenAI Realtime API | news.ycombinator.com | 2025-04-22
This looks like so much fun! I have recently gotten into working with electronics, so it seems like a nice little project to undertake.
I noticed that it is dependent on openAIs realtime API, so it got me wondering what open alternatives there are.
I could only find ultravox (https://github.com/fixie-ai/ultravox) that would seem to really work as realtime. It seems to be some model that wires up llama and whisper somehow, rather than treating them as separate steps which is common with other projects,
What other options are available for this kind of real-time behaviour?
-
-
-
Project mention: Show HN: Background noise removal in multimedia with a single command | news.ycombinator.com | 2025-10-06
-
-
-
aeneas
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
-
whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
-
-
-
openai-edge-tts
Free, high-quality text-to-speech API endpoint to replace OpenAI, Azure, or ElevenLabs
Project mention: Open source TTS by Resemble (claiming they are sota) | news.ycombinator.com | 2025-06-11It can definitely run on CPU — but I'm not sure if it can run on a machine without a GPU _entirely_.
To be honest, it uses a decently large amount of resources. If you had a GPU, you could expect about 4-5 gb memory usage. And given the optimizations for tensors on GPUs, I'm not sure how well thinks would work "CPU only".
If you try it, let me know. There are some "CPU" Docker builds in the repo you could look at for guidance.
If you want free TTS without using local resources, you could try edge-tts https://github.com/travisvn/openai-edge-tts
-
-
-
StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Speech discussion
Python Speech related posts
-
Making AI Models Faster, Cheaper, and Greener — Here’s How
-
2025 Voice AI Guide: How to Make Your Own Real-Time Voice Agent (Part-1)
-
Ask HN: What Speaker Diarization tools should I look into?
-
Training with Big Data on Any Cloud
-
Show HN: Mikey – No bot meeting notetaker for Windows
-
Ask HN: Is Whisper Still Relevant?
-
Show HN: Using YOLO to Detect Office Chairs in 40M Hotel Photos
-
A note from our sponsor - SaaSHub
www.saashub.com | 15 Nov 2025
Index
What are some of the best open-source Speech projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | TTS | 43,441 |
| 2 | MockingBird | 36,745 |
| 3 | datasets | 20,844 |
| 4 | whisperX | 18,709 |
| 5 | AudioGPT | 10,200 |
| 6 | modelscope | 8,452 |
| 7 | EmotiVoice | 8,367 |
| 8 | silero-vad | 7,348 |
| 9 | ultravox | 4,258 |
| 10 | speech-to-speech | 4,230 |
| 11 | metavoice-src | 4,191 |
| 12 | DeepFilterNet | 3,407 |
| 13 | whisper-asr-webservice | 3,007 |
| 14 | lingvo | 2,854 |
| 15 | aeneas | 2,742 |
| 16 | whisper-timestamped | 2,657 |
| 17 | gTTS | 2,549 |
| 18 | IMS-Toucan | 1,654 |
| 19 | openai-edge-tts | 1,377 |
| 20 | SALMONN | 1,352 |
| 21 | voicefixer | 1,232 |
| 22 | StreamSpeech | 1,192 |
| 23 | dc_tts | 1,160 |