Project Status : INACTIVE
Code Status : STABLE
Development Status : CONCLUDED
This project is a collection of local/offline STT (Speech-to-Text) demos used to benchmark and explore different open-source speech recognition engines. Designed for robotics and voice interface applications, each demo includes either a real-time or batch processing interface for fast testing and integration.
Key Outcome: The primary conclusion of this exploration is that pywhispercpp is the recommended engine for future integration due to its superior performance and native macOS support.
Goals include:
- Evaluate transcription speed and accuracy
- Compare real-time vs batch models
- Support macOS (Apple Silicon) with MPS where applicable
- Build a foundation for full-duplex speech interaction
- Integrate with TTS_Demos in future agents
- Add transcript benchmarking + WER tools
- Measure latency, duplication, and streaming fidelity
Like our work? Consider supporting Jenkins Robotics!
Subscribe ➔ https://www.youtube.com/@Jenkins_Robotics
Patreon ➔ https://www.patreon.com/JenkinsRobotics
Venmo ➔ https://venmo.com/u/JenkinsRobotics
STT Engines Included
Installation Instructions
CLI + Real-Time App Summaries
Next Steps
Licenses and Credits
| Engine | Interface | Offline? | Notes |
|---|---|---|---|
| Vosk | Real-time | ✅ Yes | Fast, lightweight, low-memory CPU STT |
| FasterWhisper | Real-time | ✅ Yes | CTranslate2-backed Whisper. High accuracy, CPU-only on Mac |
| Whisper.cpp | CLI + GUI | ✅ Yes | Metal/ANE-accelerated C++ engine for macOS |
| pywhispercpp | Python API | ✅ Yes | Metal-accelerated Python bindings for Whisper.cpp |
| Whisper MLX | File | ✅ Yes | GPU-accelerated MLX backend for macOS |
| RealTimeSTT | Real-time | ✅ Yes | Lightweight real-time demo |
| SpeechRecSTT | Real-time | ✅ Yes | Uses Python’s SpeechRecognition/pocketsphinx |
Clone this repo and install dependencies for each STT demo as needed. For macOS (Apple Silicon recommended):
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtTo run a demo:
python whisper_stt.py # Whisper offline
python vosk_stt.py # Vosk-based
python real_time_stt.py # Stream + print live transcript
python speechrec_stt.py # SpeechRecognition (pocketsphinx)To run Whisper.cpp CLI-based GUI:
python whisper_gui_app.py # Runs rolling 10s inference using whisper.cpp
-
whisper_gui_app.py
Uses Whisper.cpp via CLI, transcribes 10s rolling mic buffers. Shows final, clean transcript and saves to.txt. -
whisper_stt.py
Runs FasterWhisper (CTranslate2) on CPU. GUI with volume meter and chunked partial/final transcript view. -
vosk_stt.py
Lightweight Kaldi-based transcription. Fast and accurate. CPU only. -
pywhispercpp_demo.py
GPU-accelerated via Metal. Uses pywhispercpp binding and simple file-based API. -
mlx_whisper_stt.py
Apple MLX version of Whisper. Fast file-based inference withwhisper-mediummodel. -
real_time_stt.py
Basic microphone streaming demo. Updates in real time. -
speechrec_stt.py
Fully offline. Uses pocketsphinx via SpeechRecognition for basic commands.
This project is no longer in active development. The explored engines and findings—particularly the selection of pywhispercpp—will inform future agents and integrations.
SUPPORT US ►
Subscribe ➔ https://www.youtube.com/@Jenkins_Robotics
Patreon ➔ https://www.patreon.com/JenkinsRobotics
Venmo ➔ https://venmo.com/u/JenkinsRobotics
FOLLOW US ►
Discord ➔ https://discord.gg/sAnE5pRVyT
Patreon ➔ https://www.patreon.com/JenkinsRobotics
Twitter ➔ https://twitter.com/jenkinsrobotics
Instagram ➔ https://www.instagram.com/jenkinsrobotics/
Facebook ➔ https://www.facebook.com/jenkinsrobotics/
GitHub ➔ https://jenkinsrobotics.github.io
All third-party models and libraries retain their original licenses. This repo is intended for R&D, robotics, and AI voice assistant prototyping.
© Jenkins Robotics 2025