WinSTT

An application for desktop STT using OpenAI-Whisper

Type in any application using your voice. WinSTT is an application that leverages the power of OpenAI's Whisper STT model for efficient voice typing functionality. This desktop tool allows users to transcribe speech into text, with support for over 99 languages and the capability to run locally without the need for an internet connection.

Why

Existing Windows speech to text is slow, not accurate, and not intuitive. This app provides customizable hotkey activation, and fast and accurate transcription for rapid typing. This is especially useful to those who write articles, blogs, and even conversations.

Setup

Precompiled Binary (Recommended for Windows Users)

Download the .exe file from the latest release from the Releases section .

Development Setup

This project uses uv for fast and reliable dependency management.

Prerequisites

Install uv:

# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or via pip
pip install uv

Install Dependencies

First, clone the repo:

git clone https://github.com/dahshury/WinSTT

Navigate to the cloned directory:
```
cd WinSTT
```
Choose your installation method:

CPU-Only Installation (Recommended for most users) Lighter installation without GPU acceleration:
```
uv sync --extra cpu
```
GPU-Accelerated Installation For systems with NVIDIA GPUs and CUDA support:
```
uv sync --extra gpu
```
Development Installation For contributors and developers:
```
# Basic development setup with CPU
uv sync --extra dev --extra cpu

# Development with GPU acceleration  
uv sync --extra dev --extra gpu
```
Linux users only: additional setup for PyAudio

For Linux, you need to install PortAudio, which PyAudio depends on. Use the following commands to install PortAudio on common Linux distributions:
- Debian/Ubuntu:
```
sudo apt update
sudo apt install portaudio19-dev libxcb1 libxcb-cursor0 libxcb-keysyms1 libxcb-render0 libxcb-shape0 libxcb-shm0 libxcb-xfixes0 libxcb-icccm4 libxcb-image0 libxcb-sync1 libxcb-xinerama0 libxcb-randr0 libxcb-util1 libx11-xcb1 libxrender1 libxkbcommon-x11-0
```

Available Dependency Groups

Core: Always installed - basic application framework
cpu: CPU-only inference using faster-whisper and ONNX
gpu: GPU-accelerated inference with CUDA support
dev: Development tools (linting, type checking, pre-commit)
build: Distribution and packaging tools

Start The App

# Recommended: Run with async loading screen
uv run python src/main_async.py

# Or run the standard version  
uv run python src/main.py

# Or activate the environment and run
source .venv/bin/activate  # Linux/macOS
# .venv\Scripts\activate   # Windows
python src/main_async.py

Recommended: Use main_async.py for better startup experience with loading screen.

Building for Distribution

# Install build dependencies
uv sync --extra build

# Build executable
uv run pyinstaller --onefile src/main.py

Updating Dependencies

# Update all dependencies
uv lock --upgrade

# Sync updated lockfile
uv sync

Usage

Hold the Alt+Ctrl+A key combination to start recording, release it to stop. There can be a very slight delay between the start of the pressing and the start of the app listening to the audio from your microphone. You should only start speaking after hearing the audio cue.

Releasing the key will transcribe the audio you recorded, paste it wherever your typing pointer is in any application. The processing speed will depend on the model chosen and your computer capabilities.
The app contains a "record key" button, which allows you to change the recording key that you have to hold to start recording. Press record key, and then press and hold the buttons you wish to start the recording with, then click stop to change the recording key.
This tool is powered by Hugging Face's ASR models, primarily Whisper by OpenAI. The larger the model, the better the accuracy and the slower the speed. Try the model that best suits your hardware and needs.

Notes

Upon loading the app for the first time, Please wait for the model files to be downloaded, (about 1 GB for CPU version, 3 GB for GPU version) this will depend on your internet connection. After the model is downloaded, no internet connection needed unless you change the model. After that, the first recording might be pasted a little bit slower than the consequent ones.
The app will automatically detect if audio is present in the speech. If not, or if an error occurs, it will output a message inside the app and inside the logs folder.
The application only records while the record key is held down.
You can use this app using a CPU, it will run Whisper-Turbo quantized by default. However, if you have a CUDA GPU, the app will run the full version and this will increase the speed and the accuracy and is highly recommended.
The application does not transcribe audio that is less than 0.5 second long. If your sentence is short, consider not letting go of the button until 0.5s has passed.
Some antivirus programs may flag .exe files generated by PyInstaller as current releases as suspicious. This is a known issue. Rest assured, the binaries are clean and safe. The app has passed most VirusTotal's tests, which you can check out here, the rest are false positives.

Troubleshooting

White Screen Issue

If the application starts but shows only a white/blank window:

Use the async version (recommended):
```
uv run python src/main_async.py
```
This version shows a loading screen while workers initialize, fixing the white screen issue.
Try the minimal test version:
```
uv run python src/main_minimal.py
```
This tests just the UI without worker initialization to isolate the issue.

Check dependencies:

# Ensure you have the correct dependencies installed
uv sync --extra cpu  # or --extra gpu

Common Issues

Resource loading errors: Usually fixed by running the test script above
Import errors: Ensure all dependencies are installed with uv sync
GPU version issues: Try the CPU version first with uv sync --extra cpu
Python version: Requires Python 3.11 or higher

Getting Help

If issues persist:

Run the test script and include the output when reporting issues
Check the log/ directory for error messages
Try the legacy version as a fallback

Acknowledgments

Silero's Voice Activity Detection (VAD) is implemented to prevent hallucinations on silence start, and prevent empty file processing.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.cursorrules		.cursorrules
.trae		.trae
log		log
logger		logger
media		media
src		src
utils		utils
.cursorignore		.cursorignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
autopyexeconfig.json		autopyexeconfig.json
license		license
pyproject.toml		pyproject.toml
refactoring_plan.md		refactoring_plan.md
settings.json		settings.json
uv.lock		uv.lock
winSTT.py		winSTT.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WinSTT

Why

Setup

Precompiled Binary (Recommended for Windows Users)

Development Setup

Prerequisites

Install Dependencies

Available Dependency Groups

Start The App

Building for Distribution

Updating Dependencies

Usage

Notes

Troubleshooting

White Screen Issue

Common Issues

Getting Help

Acknowledgments

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

dahshury/WinSTT

Folders and files

Latest commit

History

Repository files navigation

WinSTT

Why

Setup

Precompiled Binary (Recommended for Windows Users)

Development Setup

Prerequisites

Install Dependencies

Available Dependency Groups

Start The App

Building for Distribution

Updating Dependencies

Usage

Notes

Troubleshooting

White Screen Issue

Common Issues

Getting Help

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages