Thanks to visit codestin.com
Credit goes to github.com

Skip to content

dahshury/WinSTT

Repository files navigation

Alt text WinSTT

Alt text

An application for desktop STT using OpenAI-Whisper

Type in any application using your voice. WinSTT is an application that leverages the power of OpenAI's Whisper STT model for efficient voice typing functionality. This desktop tool allows users to transcribe speech into text, with support for over 99 languages and the capability to run locally without the need for an internet connection.

Why

Existing Windows speech to text is slow, not accurate, and not intuitive. This app provides customizable hotkey activation, and fast and accurate transcription for rapid typing. This is especially useful to those who write articles, blogs, and even conversations.

Setup

Precompiled Binary (Recommended for Windows Users)

  • Download the .exe file from the latest release from the Releases section .

Development Setup

This project uses uv for fast and reliable dependency management.

Prerequisites

  1. Install uv:
    # Windows (PowerShell)
    powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
    
    # Or via pip
    pip install uv

Install Dependencies

  • First, clone the repo:

    git clone https://github.com/dahshury/WinSTT
  • Navigate to the cloned directory:

    cd WinSTT
  • Choose your installation method:

    CPU-Only Installation (Recommended for most users) Lighter installation without GPU acceleration:

    uv sync --extra cpu

    GPU-Accelerated Installation For systems with NVIDIA GPUs and CUDA support:

    uv sync --extra gpu

    Development Installation For contributors and developers:

    # Basic development setup with CPU
    uv sync --extra dev --extra cpu
    
    # Development with GPU acceleration  
    uv sync --extra dev --extra gpu
    Linux users only: additional setup for PyAudio

    For Linux, you need to install PortAudio, which PyAudio depends on. Use the following commands to install PortAudio on common Linux distributions:

    • Debian/Ubuntu:
      sudo apt update
      sudo apt install portaudio19-dev libxcb1 libxcb-cursor0 libxcb-keysyms1 libxcb-render0 libxcb-shape0 libxcb-shm0 libxcb-xfixes0 libxcb-icccm4 libxcb-image0 libxcb-sync1 libxcb-xinerama0 libxcb-randr0 libxcb-util1 libx11-xcb1 libxrender1 libxkbcommon-x11-0

Available Dependency Groups

  • Core: Always installed - basic application framework
  • cpu: CPU-only inference using faster-whisper and ONNX
  • gpu: GPU-accelerated inference with CUDA support
  • dev: Development tools (linting, type checking, pre-commit)
  • build: Distribution and packaging tools

Start The App

# Recommended: Run with async loading screen
uv run python src/main_async.py

# Or run the standard version  
uv run python src/main.py

# Or activate the environment and run
source .venv/bin/activate  # Linux/macOS
# .venv\Scripts\activate   # Windows
python src/main_async.py

Recommended: Use main_async.py for better startup experience with loading screen.

Building for Distribution

# Install build dependencies
uv sync --extra build

# Build executable
uv run pyinstaller --onefile src/main.py

Updating Dependencies

# Update all dependencies
uv lock --upgrade

# Sync updated lockfile
uv sync

Usage

Hold the Alt+Ctrl+A key combination to start recording, release it to stop. There can be a very slight delay between the start of the pressing and the start of the app listening to the audio from your microphone. You should only start speaking after hearing the audio cue.

  • Releasing the key will transcribe the audio you recorded, paste it wherever your typing pointer is in any application. The processing speed will depend on the model chosen and your computer capabilities.

  • The app contains a "record key" button, which allows you to change the recording key that you have to hold to start recording. Press record key, and then press and hold the buttons you wish to start the recording with, then click stop to change the recording key.

  • This tool is powered by Hugging Face's ASR models, primarily Whisper by OpenAI. The larger the model, the better the accuracy and the slower the speed. Try the model that best suits your hardware and needs.

Notes

  • Upon loading the app for the first time, Please wait for the model files to be downloaded, (about 1 GB for CPU version, 3 GB for GPU version) this will depend on your internet connection. After the model is downloaded, no internet connection needed unless you change the model. After that, the first recording might be pasted a little bit slower than the consequent ones.
  • The app will automatically detect if audio is present in the speech. If not, or if an error occurs, it will output a message inside the app and inside the logs folder.
  • The application only records while the record key is held down.
  • You can use this app using a CPU, it will run Whisper-Turbo quantized by default. However, if you have a CUDA GPU, the app will run the full version and this will increase the speed and the accuracy and is highly recommended.
  • The application does not transcribe audio that is less than 0.5 second long. If your sentence is short, consider not letting go of the button until 0.5s has passed.
  • Some antivirus programs may flag .exe files generated by PyInstaller as current releases as suspicious. This is a known issue. Rest assured, the binaries are clean and safe. The app has passed most VirusTotal's tests, which you can check out here, the rest are false positives.

Troubleshooting

White Screen Issue

If the application starts but shows only a white/blank window:

  1. Use the async version (recommended):

    uv run python src/main_async.py

    This version shows a loading screen while workers initialize, fixing the white screen issue.

  2. Try the minimal test version:

    uv run python src/main_minimal.py

    This tests just the UI without worker initialization to isolate the issue.

  3. Check dependencies:

    # Ensure you have the correct dependencies installed
    uv sync --extra cpu  # or --extra gpu

Common Issues

  • Resource loading errors: Usually fixed by running the test script above
  • Import errors: Ensure all dependencies are installed with uv sync
  • GPU version issues: Try the CPU version first with uv sync --extra cpu
  • Python version: Requires Python 3.11 or higher

Getting Help

If issues persist:

  1. Run the test script and include the output when reporting issues
  2. Check the log/ directory for error messages
  3. Try the legacy version as a fallback

Acknowledgments

About

A windows app to type by using a customizable hotkey utilizing OpenAI's whisper and a nice GUI

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages