A locally run Speech-To-Text voice typer enabled with hotkeys and floating interface for effortless voice typing, powered by the publicly available OpenAI Whisper model.
I created this because I couldn't find a free solution that uses a locally run Whisper model (or any speech-to-text model) with a hotkey activation for easy interaction. Even if a solution exists, this was a cool project to work on. A hotkey, like holding Alt+X to record and release to transcribe, is an efficient way to control a speech-to-text model! It can type on any input, so as long as your cursor can access it, you can use VoiceTyper on it. With all the typing we do for LLMs, sometimes we want to explain things in detail, but typing it all out can feel long and tedious. Using voice is faster and more interactive. Now, the next few projects will be built even faster. Let's gooo!
P.S. I've been using it ALOT, maybe too much, hopefully I still remember how to type with a keyboard.
- Floating, draggable, minimal mic/close app interface
- Speech-to-text transcription using OpenAI's Whisper model
- Automatic output of transcribed text to the active window
- Global hotkey hold (Alt+X) to start/stop recording
- Press the "X" button on the floating window to quit
- Microphone
- Windows 10 or later
- At least 4GB of VRAM (for GPU acceleration)
- Note: Hardware will determine the transcription performance.
- I've only tested it on Windows 11. For others, you're free to try if it works.
- Since the exe file is > 2GB, github's release limit, it's split into 2.
- (1) VoiceTyper.7z.001, and (2) VoiceTyper.7z.002
- Go to Releases Page & Download all files.
- Install 7-Zip from www.7-zip.org if you haven't already.
- Right-click on VoiceTyper.7z.001 > Select 7-Zip > Extract Here.
- Run the VoiceTyper.exe file
- Wait startup & model download, there'll be notifications
- Startup (~5-30s)
- Model Download (For first time users) (~1m-5m)
- Waiting times will vary between hardwares & internet connections
- In any text field, hold Alt+X to start/stop recording
- Press the "X" button on the floating window to quit
- Clone this repository
- Create a virtual environment:
python -m venv env - Activate the virtual environment:
- Windows:
.\env\Scripts\activate - Unix/MacOS:
source env/bin/activate
- Windows:
- Install dependencies:
pip install -r requirements.txt
- Use
.\run.shto run the application in development mode
- Ensure you have PyInstaller installed:
pip install pyinstaller - Run the build script:
.\build.sh - The executable will be created in the
distfolder
main.py: Entry point of the applicationui.py: User interface implementationrecorder.py: Audio recording functionalitytranscriber.py: Speech-to-text transcription using Whispertyper.py: Handles text output to the active windowutils.py: Utility functions
- The hot keys (ALT+X), which is the ideal way to use it works. However, clicking on the record button removes focus from the input the cursor was at. This causes the program to enter characters into the unknown. For now, it's just used as a recording indicator.
Made with ❤️ by @faqihxdev