Thanks to visit codestin.com
Credit goes to github.com

Skip to content

JenkinsRobotics/STT_Demos

Repository files navigation

Jenkins Robotics

STT_Demos – Local Speech Recognition Showcase

Project Information

Project Status :   INACTIVE  
Code Status :   STABLE  
Development Status :   CONCLUDED  

 

General Information

This project is a collection of local/offline STT (Speech-to-Text) demos used to benchmark and explore different open-source speech recognition engines. Designed for robotics and voice interface applications, each demo includes either a real-time or batch processing interface for fast testing and integration.

Key Outcome: The primary conclusion of this exploration is that pywhispercpp is the recommended engine for future integration due to its superior performance and native macOS support.

Goals include:

  • Evaluate transcription speed and accuracy
  • Compare real-time vs batch models
  • Support macOS (Apple Silicon) with MPS where applicable
  • Build a foundation for full-duplex speech interaction
  • Integrate with TTS_Demos in future agents
  • Add transcript benchmarking + WER tools
  • Measure latency, duplication, and streaming fidelity

 

Support

Like our work? Consider supporting Jenkins Robotics!

Subscribe ➔ https://www.youtube.com/@Jenkins_Robotics

Patreon ➔ https://www.patreon.com/JenkinsRobotics

Venmo ➔ https://venmo.com/u/JenkinsRobotics

 

Table of Contents

STT Engines Included

Installation Instructions

CLI + Real-Time App Summaries

Next Steps

Licenses and Credits

 

STT Engines Included

Engine Interface Offline? Notes
Vosk Real-time ✅ Yes Fast, lightweight, low-memory CPU STT
FasterWhisper Real-time ✅ Yes CTranslate2-backed Whisper. High accuracy, CPU-only on Mac
Whisper.cpp CLI + GUI ✅ Yes Metal/ANE-accelerated C++ engine for macOS
pywhispercpp Python API ✅ Yes Metal-accelerated Python bindings for Whisper.cpp
Whisper MLX File ✅ Yes GPU-accelerated MLX backend for macOS
RealTimeSTT Real-time ✅ Yes Lightweight real-time demo
SpeechRecSTT Real-time ✅ Yes Uses Python’s SpeechRecognition/pocketsphinx

 

Installation Instructions

Clone this repo and install dependencies for each STT demo as needed. For macOS (Apple Silicon recommended):

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

To run a demo:

python whisper_stt.py         # Whisper offline
python vosk_stt.py            # Vosk-based
python real_time_stt.py       # Stream + print live transcript
python speechrec_stt.py       # SpeechRecognition (pocketsphinx)

To run Whisper.cpp CLI-based GUI:

python whisper_gui_app.py     # Runs rolling 10s inference using whisper.cpp

 

CLI + Real-Time App Summaries

  • whisper_gui_app.py
    Uses Whisper.cpp via CLI, transcribes 10s rolling mic buffers. Shows final, clean transcript and saves to .txt.

  • whisper_stt.py
    Runs FasterWhisper (CTranslate2) on CPU. GUI with volume meter and chunked partial/final transcript view.

  • vosk_stt.py
    Lightweight Kaldi-based transcription. Fast and accurate. CPU only.

  • pywhispercpp_demo.py
    GPU-accelerated via Metal. Uses pywhispercpp binding and simple file-based API.

  • mlx_whisper_stt.py
    Apple MLX version of Whisper. Fast file-based inference with whisper-medium model.

  • real_time_stt.py
    Basic microphone streaming demo. Updates in real time.

  • speechrec_stt.py
    Fully offline. Uses pocketsphinx via SpeechRecognition for basic commands.

 

Next Steps

This project is no longer in active development. The explored engines and findings—particularly the selection of pywhispercpp—will inform future agents and integrations.

 

Links

SUPPORT US ►

Subscribe ➔ https://www.youtube.com/@Jenkins_Robotics
Patreon ➔ https://www.patreon.com/JenkinsRobotics
Venmo ➔ https://venmo.com/u/JenkinsRobotics

FOLLOW US ►

Discord ➔ https://discord.gg/sAnE5pRVyT
Patreon ➔ https://www.patreon.com/JenkinsRobotics
Twitter ➔ https://twitter.com/jenkinsrobotics
Instagram ➔ https://www.instagram.com/jenkinsrobotics/
Facebook ➔ https://www.facebook.com/jenkinsrobotics/
GitHub ➔ https://jenkinsrobotics.github.io

 

Licenses and Credits

All third-party models and libraries retain their original licenses. This repo is intended for R&D, robotics, and AI voice assistant prototyping.

© Jenkins Robotics 2025

About

A collection of local and offline-capable speech-to-text (STT) demos for evaluating open-source transcription engines with real-time and batch modes across CLI and GUI tools.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages