OpenWhispr

An open source desktop dictation application that converts speech to text using OpenAI Whisper. Features both local and cloud processing options for maximum flexibility and privacy.

License

This project is licensed under the MIT License - see the LICENSE file for details. This means you can freely use, modify, and distribute this software for personal or commercial purposes.

Features

🎤 Global Hotkey: Customizable hotkey to start/stop dictation from anywhere (default: backtick `)
🤖 Multi-Provider AI Processing: Choose between OpenAI, Anthropic Claude, Google Gemini, or local models
🎯 Agent Naming: Personalize your AI assistant with a custom name for natural interactions
🧠 Latest AI Models (September 2025):
- OpenAI: GPT-5 Series, GPT-4.1 Series, o-series reasoning models (o3/o4-mini)
- Anthropic: Claude Opus 4.1, Claude Sonnet 4, Claude 3.5 Sonnet/Haiku
- Google: Gemini 2.5 Pro/Flash/Flash-Lite with thinking capability, Gemini 2.0 Flash
- Local: Qwen, LLaMA, Mistral models via llama.cpp
🔒 Privacy-First: Local processing keeps your voice data completely private
🎨 Modern UI: Built with React 19, TypeScript, and Tailwind CSS v4
🚀 Fast: Optimized with Vite and modern tooling
📱 Control Panel: Manage settings, view history, and configure API keys
🗄️ Transcription History: SQLite database stores all your transcriptions locally
🔧 Model Management: Download and manage local Whisper models (tiny, base, small, medium, large, turbo)
🧹 Model Cleanup: One-click removal of cached Whisper models with uninstall hooks to keep disks tidy
🌐 Cross-Platform: Works on macOS, Windows, and Linux
⚡ Automatic Pasting: Transcribed text automatically pastes at your cursor location
🖱️ Draggable Interface: Move the dictation panel anywhere on your screen
🔄 OpenAI Responses API: Using the latest Responses API for improved performance
🌐 Globe Key Toggle (macOS): Optional Fn/Globe key listener for a hardware-level dictation trigger

Prerequisites

Node.js 18+ and npm (Download from nodejs.org)
macOS 10.15+, Windows 10+, or Linux
On macOS, Globe key support requires the Xcode Command Line Tools (xcode-select --install) so the bundled Swift helper can run
Python 3.7+ (Optional - the app can install it automatically for local Whisper processing)

Quick Start

For Personal Use (Recommended)

Clone the repository:

git clone https://github.com/HeroTools/open-whispr.git
cd open-whispr

Install dependencies:
```
npm install
```
Optional: Set up API keys (only needed for cloud processing):

Method A - Environment file:
```
cp env.example .env
# Edit .env and add your API keys:
# OPENAI_API_KEY=your_openai_key
# ANTHROPIC_API_KEY=your_anthropic_key  
# GEMINI_API_KEY=your_gemini_key
```
Method B - In-app configuration:
- Run the app and configure API keys through the Control Panel
- Keys are automatically saved and persist across app restarts

Run the application:

npm run dev  # Development mode with hot reload
# OR
npm start    # Production mode

Building for Personal Use (Optional)

If you want to build a standalone app for personal use:

# Build without code signing (no certificates required)
npm run pack

# The unsigned app will be in: dist/mac-arm64/OpenWhispr.app (macOS)
# or dist/win-unpacked/OpenWhispr.exe (Windows)
# or dist/linux-unpacked/open-whispr (Linux)

Note: On macOS, you may see a security warning when first opening the unsigned app. Right-click and select "Open" to bypass this.

Building for Distribution

For maintainers who need to distribute signed builds:

# Requires code signing certificates and notarization setup
npm run build:mac    # macOS (requires Apple Developer account)
npm run build:win    # Windows (requires code signing cert)
npm run build:linux  # Linux

First Time Setup

Choose Processing Method:
- Local Processing: Download Whisper models for completely private transcription
- Cloud Processing: Use OpenAI's API for faster transcription (requires API key)
Grant Permissions:
- Microphone Access: Required for voice recording
- Accessibility Permissions: Required for automatic text pasting (macOS)
Name Your Agent: Give your AI assistant a personal name (e.g., "Assistant", "Jarvis", "Alex")
- Makes interactions feel more natural and conversational
- Helps distinguish between giving commands and regular dictation
- Can be changed anytime in settings
Configure Global Hotkey: Default is backtick (`) but can be customized

Usage

Basic Dictation

Start the app - A small draggable panel appears on your screen
Press your hotkey (default: backtick `) - Start dictating (panel shows recording animation)
Press your hotkey again - Stop dictation and begin transcription (panel shows processing animation)
Text appears - Transcribed text is automatically pasted at your cursor location
Drag the panel - Click and drag to move the dictation panel anywhere on your screen

Control Panel

Access: Right-click the tray icon (macOS) or through the system menu
Configure: Choose between local and cloud processing
History: View, copy, and delete past transcriptions
Models: Download and manage local Whisper models
Storage Cleanup: Remove downloaded Whisper models from cache to reclaim space
Settings: Configure API keys, customize hotkeys, and manage permissions

Uninstall & Cache Cleanup

In-App: Use Settings → Speech to Text Processing → Local Model Storage → Remove Downloaded Models to clear ~/.cache/openwhispr/models (or %USERPROFILE%\.cache\openwhispr\models on Windows).
Windows Uninstall: The NSIS uninstaller automatically deletes the same cache directory.
Linux Packages: deb/rpm post-uninstall scripts also remove cached models.
macOS: If you uninstall manually, remove ~/Library/Caches or ~/.cache/openwhispr/models if desired.

Agent Naming & AI Processing

Once you've named your agent during setup, you can interact with it using multiple AI providers:

🎯 Agent Commands (for AI assistance):

"Hey [AgentName], make this more professional"
"Hey [AgentName], format this as a list"
"Hey [AgentName], write a thank you email"
"Hey [AgentName], convert this to bullet points"

🤖 AI Provider Options:

OpenAI:
- GPT-5 Series (Nano/Mini/Full) - Latest generation with deep reasoning
- GPT-4.1 Series - Enhanced coding with 1M token context
- o3/o4 Series - Advanced reasoning models with longer thinking
Anthropic: Claude Opus 4.1, Sonnet 4 - Frontier intelligence models
Google: Gemini 2.5 Pro/Flash - Advanced multi-modal capabilities
Local: Community models for complete privacy

📝 Regular Dictation (for normal text):

"This is just normal text I want transcribed"
"Meeting notes: John mentioned the quarterly report"
"Dear Sarah, thank you for your help"

The AI automatically detects when you're giving it commands versus dictating regular text, and removes agent name references from the final output.

Processing Options

Local Processing:
- Install Whisper automatically through the Control Panel
- Download models: tiny (fastest), base (recommended), small, medium, large (best quality)
- Complete privacy - audio never leaves your device
Cloud Processing:
- Requires OpenAI API key
- Faster processing
- Uses OpenAI's Whisper API

Project Structure

open-whispr/
├── main.js              # Electron main process & IPC handlers
├── preload.js           # Electron preload script & API bridge
├── whisper_bridge.py    # Python script for local Whisper processing
├── setup.js             # First-time setup script
├── package.json         # Dependencies and scripts
├── env.example          # Environment variables template
├── CHANGELOG.md         # Project changelog
├── src/
│   ├── App.jsx          # Main dictation interface
│   ├── main.jsx         # React entry point
│   ├── index.html       # Vite HTML template
│   ├── index.css        # Tailwind CSS v4 configuration
│   ├── vite.config.js   # Vite configuration
│   ├── components/
│   │   ├── ControlPanel.tsx     # Settings and history UI
│   │   ├── OnboardingFlow.tsx   # First-time setup wizard
│   │   ├── SettingsPage.tsx     # Settings interface
│   │   ├── ui/                  # shadcn/ui components
│   │   │   ├── button.tsx
│   │   │   ├── card.tsx
│   │   │   ├── input.tsx
│   │   │   ├── LoadingDots.tsx
│   │   │   ├── DotFlashing.tsx
│   │   │   ├── Toast.tsx
│   │   │   ├── toggle.tsx
│   │   │   └── tooltip.tsx
│   │   └── lib/
│   │       └── utils.ts         # Utility functions
│   ├── services/
│   │   └── ReasoningService.ts  # Multi-provider AI processing (OpenAI/Anthropic/Gemini)
│   ├── utils/
│   │   └── agentName.ts         # Agent name management utility
│   └── components.json          # shadcn/ui configuration
└── assets/                      # App icons and resources

Technology Stack

Frontend: React 19, TypeScript, Tailwind CSS v4
Build Tool: Vite with optimized Tailwind plugin
Desktop: Electron 36 with context isolation
UI Components: shadcn/ui with Radix primitives
Database: better-sqlite3 for local transcription storage
Speech-to-Text: OpenAI Whisper (local models + API)
Local Processing: Python with OpenAI Whisper package
Icons: Lucide React for consistent iconography

Development

Scripts

npm run dev - Start development with hot reload
npm run start - Start production build
npm run setup - First-time setup (creates .env file)
npm run build:renderer - Build the React app only
npm run build - Full build with signing (requires certificates)
npm run build:mac - macOS build with signing
npm run build:win - Windows build with signing
npm run build:linux - Linux build
npm run pack - Build without signing (for personal use)
npm run dist - Build and package with signing
npm run lint - Run ESLint
npm run preview - Preview production build

Architecture

The app consists of two main windows:

Main Window: Minimal overlay for dictation controls
Control Panel: Full settings and history interface

Both use the same React codebase but render different components based on URL parameters.

Key Components

main.js: Electron main process, IPC handlers, database operations
preload.js: Secure bridge between main and renderer processes
App.jsx: Main dictation interface with recording controls
ControlPanel.tsx: Settings, history, and model management
whisper_bridge.py: Python bridge for local Whisper processing
better-sqlite3: Local database for transcription history

Tailwind CSS v4 Setup

This project uses the latest Tailwind CSS v4 with:

CSS-first configuration using @theme directive
Vite plugin for optimal performance
Custom design tokens for consistent theming
Dark mode support with @variant

Building

The build process creates a single executable for your platform:

# Development build
npm run pack

# Production builds
npm run dist           # Current platform
npm run build:mac      # macOS DMG + ZIP
npm run build:win      # Windows NSIS + Portable
npm run build:linux    # AppImage + DEB

Configuration

Environment Variables

Create a .env file in the root directory (or use npm run setup):

# OpenAI API Configuration (optional - only needed for cloud processing)
OPENAI_API_KEY=your_openai_api_key_here

# Optional: Customize the Whisper model
WHISPER_MODEL=whisper-1

# Optional: Set language for better transcription accuracy
LANGUAGE=

# Optional: Anthropic API Configuration
ANTHROPIC_API_KEY=your_anthropic_api_key_here

# Optional: Google Gemini API Configuration  
GEMINI_API_KEY=your_gemini_api_key_here

# Optional: Debug mode
DEBUG=false

Local Whisper Setup

For local processing, OpenWhispr offers automated setup:

Automatic Python Installation (if needed):
- The app will detect if Python is missing
- Offers to install Python 3.11 automatically
- macOS: Uses Homebrew if available, otherwise official installer
- Windows: Downloads and installs official Python
- Linux: Uses system package manager (apt, yum, or pacman)
Automatic Whisper Setup:
- Installs OpenAI Whisper package via pip
- Downloads your chosen model on first use
- Handles all transcription locally

Requirements:

Sufficient disk space for models (39MB - 1.5GB depending on model)
Admin/sudo access may be required for Python installation

Customization

Hotkey: Change in the Control Panel (default: backtick `) - fully customizable
Panel Position: Drag the dictation panel to any location on your screen`
Processing Method: Choose local or cloud in Control Panel
Whisper Model: Select quality vs speed in Control Panel
UI Theme: Edit CSS variables in src/index.css
Window Size: Adjust dimensions in main.js
Database: Transcriptions stored in user data directory

Contributing

We welcome contributions! Please follow these steps:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Guidelines

Run npm run lint before committing
Follow the existing code style
Update documentation as needed
Test on your target platform before submitting

Security

OpenWhispr is designed with privacy and security in mind:

Local Processing Option: Keep your voice data completely private
No Analytics: We don't collect any usage data or telemetry
Open Source: All code is available for review
Secure Storage: API keys are stored securely in your system's keychain/credential manager
Minimal Permissions: Only requests necessary permissions (microphone, accessibility)

Troubleshooting

Common Issues

Microphone permissions: Grant permissions in System Preferences/Settings
Accessibility permissions (macOS): Required for automatic text pasting
- Go to System Settings → Privacy & Security → Accessibility
- Add OpenWhispr and enable the checkbox
- Use "Fix Permission Issues" in Control Panel if needed
API key errors (cloud processing only): Ensure your OpenAI API key is valid and has credits
- Set key through Control Panel or .env file
- Check logs for "OpenAI API Key present: Yes/No"
Local Whisper installation:
- Ensure Python 3.7+ is installed
- Use Control Panel to install Whisper automatically
- Check available disk space for models
Global hotkey conflicts: Change the hotkey in the Control Panel - any key can be used
Text not pasting: Check accessibility permissions and try manual paste with Cmd+V
Panel position: If the panel appears off-screen, restart the app to reset position

Getting Help

Check the Issues page
Review the console logs for debugging information
For local processing: Ensure Python and pip are working
For cloud processing: Verify your OpenAI API key and billing status
Check the Control Panel for system status and diagnostics

Performance Tips

Local Processing: Use "base" model for best balance of speed and accuracy
Cloud Processing: Generally faster but requires internet connection
Model Selection: tiny (fastest) → base (recommended) → small → medium → large (best quality)
Permissions: Ensure all required permissions are granted for smooth operation

FAQ

Q: Is OpenWhispr really free? A: Yes! OpenWhispr is open source and free to use. You only pay for OpenAI API usage if you choose cloud processing.

Q: Which processing method should I use? A: Use local processing for privacy and offline use. Use cloud processing for speed and convenience.

Q: Can I use this commercially? A: Yes! The MIT license allows commercial use.

Q: How do I change the hotkey? A: Open the Control Panel (right-click tray icon) and go to Settings. You can set any key as your hotkey.

Q: Is my data secure? A: With local processing, your audio never leaves your device. With cloud processing, audio is sent to OpenAI's servers (see their privacy policy).

Q: What languages are supported? A: OpenWhispr supports 58 languages including English, Spanish, French, German, Chinese, Japanese, and more. Set your preferred language in the .env file or use auto-detect.

Project Status

OpenWhispr is actively maintained and ready for production use. Current version: 1.0.4

✅ Core functionality complete
✅ Cross-platform support
✅ Local and cloud processing
✅ Automatic Python/Whisper installation
✅ Agent naming system
✅ Draggable interface
🚧 Continuous improvements and bug fixes

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
.github		.github
resources		resources
scripts		scripts
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
DEBUG.md		DEBUG.md
LICENSE		LICENSE
LOCAL_WHISPER_SETUP.md		LOCAL_WHISPER_SETUP.md
README.md		README.md
cleanup.js		cleanup.js
electron-builder.json		electron-builder.json
env.example		env.example
main.js		main.js
package-lock.json		package-lock.json
package.json		package.json
preload.js		preload.js
setup.js		setup.js
whisper_bridge.py		whisper_bridge.py

License

HeroTools/open-whispr

Folders and files

Latest commit

History

Repository files navigation

OpenWhispr

License

Features

Prerequisites

Quick Start

For Personal Use (Recommended)

Building for Personal Use (Optional)

Building for Distribution

First Time Setup

Usage

Basic Dictation

Control Panel

Uninstall & Cache Cleanup

Agent Naming & AI Processing

Processing Options

Project Structure

Technology Stack

Development

Scripts

Architecture

Key Components

Tailwind CSS v4 Setup

Building

Configuration

Environment Variables

Local Whisper Setup

Customization

Contributing

Development Guidelines

Security

Troubleshooting

Common Issues

Getting Help

Performance Tips

FAQ

Project Status

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 17

Packages 0

Contributors 4

Languages

Packages