WhisperAttack - OpenAI Whisper for VoiceAttack

This repository provides a single-server approach for using OpenAI Whisper locally with VoiceAttack, replacing Windows Speech Recognition with a fully offline, GPU-accelerated blazing fast and accurate AI speech recognition engine

This is a fork for further integration of KneeboardWhisper by the amazing creator @BojoteX. A special thank you goes to @hradec, whose original script used Google Voice Recognition, @SeaTechNerd83 for helping combine the two approaches and creating a VA plugin and finally @sleighzy for VAICOM implementation and the lengthy list of bug fixes and enchancements that would fill this page

In short, SeaTechNerd83 and I combined the two scripts to run voice commands through Whisper using BojoteX's code and then pushed it into VoiceAttack using hradec's code. To speed this up, I unified the codebase into one file and made it run a server to send commands to VoiceAttack. WhisperAttack will run on any Nvidia GPU with 6GB or more of VRAM and will run along with DCS (performance tuning may be required for lower VRAM cards) although absolute minimum spec GPU has not yet been confirmed, RTX 2060 6gb and GTX 1070 8gb have been confirmed working stutter free alongside DCS in VR.

Features

OpenAI Whisper models:
- Loads the Whisper model once on GPU or CPU.
- Records mic audio on demand (via socket commands).
- Transcribes the .wav file using Whisper.
- Sends recognized text into VoiceAttack.
- Pushes transcribed text to clipboard - (perfect for voice to text DCS Chat...)
VoiceAttack Command Plugin
- Sends "start", "stop", or "shutdown" commands to the server directly through VoiceAttack.
Advantages:
- No repeated model loads (faster, especially with larger Whisper models).
- Push-to-Talk style workflow with VoiceAttack press & release.
- Extremely accurate voice recognition (No more VoiceAttack misunderstanding you!)

VAICOM integration

Instructions for integrating with VAICOM can be located in the VAICOM INTEGRATION documentation.

Requirements

VoiceAttack
- voiceattack.com
- Plugins Enabled
GPU (Optional, but Recommended)
- Whisper runs faster on an NVIDIA GPU with CUDA.
- When using GPU if CUDA is not available then an error will be logged and this will fallback to CPU

Installation

Download the latest release WhisperAttack v1.2.2.zip file from the Google drive and unarchive anywhere on your computer, e.g. C:\Program Files\WhisperAttack
A shortcut can be created to the WhisperAttack.exe application

Configuration

The default configuration files are stored beside the WhisperAttack application. Custom configuration can be kept in files of the same name in the C:\Users\username\AppData\Local\WhisperAttack directory. These custom files can be created if they do not exist and can be used to override (or add to for word mappings) the default configuration.

Keeping custom configuration at that location means it will not be overwritten when installing later versions of WhisperAttack.

See below for the list of configuration files.

settings.cfg

The settings.cfg file contains configuration for WhisperAttack.

The default values should cover most cases but can be changed:

whisper_model - The Whisper model to use, small.en by default. See the table at the bottom of the README file for options.
- A smaller size can be specified for reducing the amount of VRAM used, e.g. base.en or tiny.en
whisper_device - Which device to run the Whisper transcription process on, GPU (default) or CPU
theme - To display the WhisperAttack UI in light or dark mode. Valid values:
- default - this will use the current theme you have set for Windows
- dark - dark mode
- light - light mode

word_mappings.txt

The word_mappings.txt file contains keys and values that can be used to replace a spoken word with another word. For example, if the transcription often outputs "Inter" when you are saying "Enter" then this can be added as a word placement.

The word replacement configuration also supports specifying multiple words to be replaced with a single word, these are separated by a semicolon ;. In the example below saying either "gulf" or "gold" would be replaced with "Golf".

gulf;gold=Golf
inter=Inter

WhisperAttack needs to be restarted after making changes to this file. New word mappings can be added via the configuration screen and do not require a restart. When adding new word mappings they will be created in your custom configuration file, C:\Users\username\AppData\Local\WhisperAttack\word_mappings.txt

Running the Whisper Server

Double click the WhisperAttack.exe file or shortcut. This will open an application window and start the server.

The application window will display startup logging information, the raw text transcribed from the speech, and the final cleaned up command ot text that was sent to VoiceAttack or DCS. The window can be closed, and then shown again from the menu in the WhisperAttack icon in the Windows system tray. WhisperAttack will continue running even when the window is closed.

WhisperAttack will have completed loading once the "Server started and listening" message is displayed.

Loading Whisper model (small.en), device=GPU ...
Server started and listening on 127.0.0.1:65432...

A WhisperAttack icon will be placed in your Windows system tray. Right-clicking this will give options to show the WhisperAttack window, or to exit the application.

Closing VoiceAttack will also stop and close WhisperAttack.

NOTE: There may be a slow startup time for the Whisper Model to download. This process only needs to take place once (unless you change the Whisper Model to be used)

The Whisper server will output logs to the C:\Users\username\AppData\Local\WhisperAttack\WhisperAttack.log file.

Configuring VoiceAttack

Pre-configured Voice Attack Profile is added to the release for your convenience. It is recommended to read through the steps below to understand how whisper injections actually work!

1. Disable all speech recognition within VoiceAttack

2. Enable Plugin support in VoiceAttack

Go to Options → General → Enable Plugin Support.

3. Place Plugin in VoiceAttack Apps folder

After extracting the .zip file, Locate the WhisperAttackServerCommand folder and copy the entire folder

Locate the VoiceAttack Apps Folder

Paste the entire WhisperAttackServerCommand folder into the Apps folder

If the plugin is enabled and active and everything is set up correctly, VoiceAttack should give these messages on startup:

4. Create Recording commands

In VoiceAttack, go to Edit Profile.

New Command for "Start Whisper Recording":

When this command executes:
- Go to Other → Advancced → Execute an External Plugin Function.
- Plugin: Point it to 'WASC V0.1beta'
- Plugin Context:

Start Whisper Recording

Assign a joystick or key press to this command (e.g., "Joystick Button 14 (pressed)").

Another Command for "Stop Whisper Recording":

Same steps, except the Parameters is:

Stop Whisper Recording

Assign the same joystick button but check "Shortcut is invoked only when released."

Adding new word mappings

Word mappings can be added to WhisperAttack so that when these words are found within transcribed sentences they will be replaced with the replacement word you provide. This can aid with replacing words that are consistently transcribed incorrectly into the word you actually want.

Click the Add word mapping button to open this configuration screen. Multiple aliases can be entered, separated by semicolons, for a single replacement.

Clipboard & DCS Kneeboard Integration - Optional

This script preserves BojoteX original vision for the code and copies the commands into clipboard for use with the Kneeboard. The original repo can be found here: https://github.com/BojoteX/KneeboardWhisper

Do the following to enable DCS Kneeboard to transcribe what you say: Once completed, you must say "Note" followed by what you would like to transcribe to kneeboard/clipboard

Troubleshooting

Library cublas64_12.dll is not found

If the below below is displayed in the logs then ensure that CUDA 12 is available, e.g. by installing the CUDA Toolkit 12

ERROR - Failed to transcribe audio: Library cublas64_12.dll is not found or cannot be loaded

ValueError: Requested int8_float16 compute type

For some GPUs which do not support certain compute types, i.e. do not have tensor cores, the below message will be output to the logs:

WARNING - GPU does not have tensor cores, major=6, minor=1

WhisperAttack can detect this and will fallback on supported values for cuda cores.

If however the below error message is displayed then the settings.cfg file can be updated.

ValueError: Requested int8_float16 compute type, but the target device or backend do not support efficient int8_float16 computation.

The settings.cfg file can be updated to add the below entry:

whisper_core_type=standard

Performance (AI Model)

If WhisperAttack is causing significant studders, It is likely that the current model is overloading your VRAM. If this is the case, studders can be alleviated by changing the model size (extra information on the models is available in the table below) in the settings.cfg file as follows:

whisper_model=base.en

Using smaller models will reduce VRAM and compute costs. See below for a full speed breakdown
First activation with a new AI model will prompt the model to be downloaded which may take an extended amount of time depending on internet speed.

Available models and languages

There are six model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model. The relative speeds below are measured by transcribing English speech on a A100, and the real-world speed may vary significantly depending on many factors including the language, the speaking speed, and the available hardware.

Size	Parameters	English-only model	Multilingual model	Required VRAM	Relative speed
tiny	39 M	`tiny.en`	`tiny`	~1 GB	~10x
base	74 M	`base.en`	`base`	~1 GB	~7x
small	244 M	`small.en`	`small`	~2 GB	~4x
medium	769 M	`medium.en`	`medium`	~5 GB	~2x
large	1550 M	N/A	`large`	~10 GB	1x
turbo	809 M	N/A	`turbo`	~6 GB	~8x

The .en models for English-only applications tend to perform better, especially for the tiny.en and base.en models. We observed that the difference becomes less significant for the small.en and medium.en models. Additionally, the turbo model is an optimized version of large-v3 that offers faster transcription speed with a minimal degradation in accuracy.

Whisper's performance varies widely depending on the language. The figure below shows a performance breakdown of large-v3 and large-v2 models by language, using WERs (word error rates) or CER (character error rates, shown in Italic) evaluated on the Common Voice 15 and Fleurs datasets. Additional WER/CER metrics corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4 of the paper, as well as the BLEU (Bilingual Evaluation Understudy) scores for translation in Appendix D.3.

Enjoy your local (offline) speech recognition with OpenAI Whisper + VoiceAttack! If you run into issues, open an issue or check the logs for clues.

Name		Name	Last commit message	Last commit date
Latest commit History 272 Commits
VAICOM PRO		VAICOM PRO
VoiceAttackPlugin/WhisperAttack		VoiceAttackPlugin/WhisperAttack
screenshots		screenshots
.gitignore		.gitignore
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE		LICENSE
README.md		README.md
WhisperAttack - VA Profile.vap		WhisperAttack - VA Profile.vap
add_icon.png		add_icon.png
configuration.py		configuration.py
fuzzy_words.txt		fuzzy_words.txt
requirements.txt		requirements.txt
settings.cfg		settings.cfg
theme.py		theme.py
whisper_attack.py		whisper_attack.py
whisper_attack_icon.png		whisper_attack_icon.png
whisper_server.py		whisper_server.py
word_mappings.py		word_mappings.py
word_mappings.txt		word_mappings.txt
writer.py		writer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WhisperAttack - OpenAI Whisper for VoiceAttack

Features

VAICOM integration

Requirements

Installation

Configuration

settings.cfg

word_mappings.txt

Running the Whisper Server

Configuring VoiceAttack

1. Disable all speech recognition within VoiceAttack

2. Enable Plugin support in VoiceAttack

3. Place Plugin in VoiceAttack Apps folder

4. Create Recording commands

New Command for "Start Whisper Recording":

Another Command for "Stop Whisper Recording":

Adding new word mappings

Clipboard & DCS Kneeboard Integration - Optional

Troubleshooting

Library cublas64_12.dll is not found

ValueError: Requested int8_float16 compute type

Performance (AI Model)

Available models and languages

About

Uh oh!

Releases 14

Packages

Languages

License

nikoelt/WhisperAttack

Folders and files

Latest commit

History

Repository files navigation

WhisperAttack - OpenAI Whisper for VoiceAttack

Features

VAICOM integration

Requirements

Installation

Configuration

settings.cfg

word_mappings.txt

Running the Whisper Server

Configuring VoiceAttack

1. Disable all speech recognition within VoiceAttack

2. Enable Plugin support in VoiceAttack

3. Place Plugin in VoiceAttack Apps folder

4. Create Recording commands

New Command for "Start Whisper Recording":

Another Command for "Stop Whisper Recording":

Adding new word mappings

Clipboard & DCS Kneeboard Integration - Optional

Troubleshooting

Library cublas64_12.dll is not found

ValueError: Requested int8_float16 compute type

Performance (AI Model)

Available models and languages

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Languages

Packages