An open source desktop dictation application that converts speech to text using OpenAI Whisper. Features both local and cloud processing options for maximum flexibility and privacy.
This project is licensed under the MIT License - see the LICENSE file for details. This means you can freely use, modify, and distribute this software for personal or commercial purposes.
- 🎤 Global Hotkey: Customizable hotkey to start/stop dictation from anywhere (default: backtick `)
- 🤖 Multi-Provider AI Processing: Choose between OpenAI, Anthropic Claude, Google Gemini, or local models
- 🎯 Agent Naming: Personalize your AI assistant with a custom name for natural interactions
- 🧠 Latest AI Models (September 2025):
- OpenAI: GPT-5 Series, GPT-4.1 Series, o-series reasoning models (o3/o4-mini)
- Anthropic: Claude Opus 4.1, Claude Sonnet 4, Claude 3.5 Sonnet/Haiku
- Google: Gemini 2.5 Pro/Flash/Flash-Lite with thinking capability, Gemini 2.0 Flash
- Local: Qwen, LLaMA, Mistral models via llama.cpp
- 🔒 Privacy-First: Local processing keeps your voice data completely private
- 🎨 Modern UI: Built with React 19, TypeScript, and Tailwind CSS v4
- 🚀 Fast: Optimized with Vite and modern tooling
- 📱 Control Panel: Manage settings, view history, and configure API keys
- 🗄️ Transcription History: SQLite database stores all your transcriptions locally
- 🔧 Model Management: Download and manage local Whisper models (tiny, base, small, medium, large, turbo)
- 🧹 Model Cleanup: One-click removal of cached Whisper models with uninstall hooks to keep disks tidy
- 🌐 Cross-Platform: Works on macOS, Windows, and Linux
- ⚡ Automatic Pasting: Transcribed text automatically pastes at your cursor location
- 🖱️ Draggable Interface: Move the dictation panel anywhere on your screen
- 🔄 OpenAI Responses API: Using the latest Responses API for improved performance
- 🌐 Globe Key Toggle (macOS): Optional Fn/Globe key listener for a hardware-level dictation trigger
- Node.js 18+ and npm (Download from nodejs.org)
- macOS 10.15+, Windows 10+, or Linux
- On macOS, Globe key support requires the Xcode Command Line Tools (
xcode-select --install) so the bundled Swift helper can run - Python 3.7+ (Optional - the app can install it automatically for local Whisper processing)
-
Clone the repository:
git clone https://github.com/HeroTools/open-whispr.git cd open-whispr -
Install dependencies:
npm install
-
Optional: Set up API keys (only needed for cloud processing):
Method A - Environment file:
cp env.example .env # Edit .env and add your API keys: # OPENAI_API_KEY=your_openai_key # ANTHROPIC_API_KEY=your_anthropic_key # GEMINI_API_KEY=your_gemini_key
Method B - In-app configuration:
- Run the app and configure API keys through the Control Panel
- Keys are automatically saved and persist across app restarts
-
Run the application:
npm run dev # Development mode with hot reload # OR npm start # Production mode
If you want to build a standalone app for personal use:
# Build without code signing (no certificates required)
npm run pack
# The unsigned app will be in: dist/mac-arm64/OpenWhispr.app (macOS)
# or dist/win-unpacked/OpenWhispr.exe (Windows)
# or dist/linux-unpacked/open-whispr (Linux)Note: On macOS, you may see a security warning when first opening the unsigned app. Right-click and select "Open" to bypass this.
For maintainers who need to distribute signed builds:
# Requires code signing certificates and notarization setup
npm run build:mac # macOS (requires Apple Developer account)
npm run build:win # Windows (requires code signing cert)
npm run build:linux # Linux-
Choose Processing Method:
- Local Processing: Download Whisper models for completely private transcription
- Cloud Processing: Use OpenAI's API for faster transcription (requires API key)
-
Grant Permissions:
- Microphone Access: Required for voice recording
- Accessibility Permissions: Required for automatic text pasting (macOS)
-
Name Your Agent: Give your AI assistant a personal name (e.g., "Assistant", "Jarvis", "Alex")
- Makes interactions feel more natural and conversational
- Helps distinguish between giving commands and regular dictation
- Can be changed anytime in settings
-
Configure Global Hotkey: Default is backtick (`) but can be customized
- Start the app - A small draggable panel appears on your screen
- Press your hotkey (default: backtick `) - Start dictating (panel shows recording animation)
- Press your hotkey again - Stop dictation and begin transcription (panel shows processing animation)
- Text appears - Transcribed text is automatically pasted at your cursor location
- Drag the panel - Click and drag to move the dictation panel anywhere on your screen
- Access: Right-click the tray icon (macOS) or through the system menu
- Configure: Choose between local and cloud processing
- History: View, copy, and delete past transcriptions
- Models: Download and manage local Whisper models
- Storage Cleanup: Remove downloaded Whisper models from cache to reclaim space
- Settings: Configure API keys, customize hotkeys, and manage permissions
- In-App: Use Settings → Speech to Text Processing → Local Model Storage → Remove Downloaded Models to clear
~/.cache/openwhispr/models(or%USERPROFILE%\.cache\openwhispr\modelson Windows). - Windows Uninstall: The NSIS uninstaller automatically deletes the same cache directory.
- Linux Packages:
deb/rpmpost-uninstall scripts also remove cached models. - macOS: If you uninstall manually, remove
~/Library/Cachesor~/.cache/openwhispr/modelsif desired.
Once you've named your agent during setup, you can interact with it using multiple AI providers:
🎯 Agent Commands (for AI assistance):
- "Hey [AgentName], make this more professional"
- "Hey [AgentName], format this as a list"
- "Hey [AgentName], write a thank you email"
- "Hey [AgentName], convert this to bullet points"
🤖 AI Provider Options:
- OpenAI:
- GPT-5 Series (Nano/Mini/Full) - Latest generation with deep reasoning
- GPT-4.1 Series - Enhanced coding with 1M token context
- o3/o4 Series - Advanced reasoning models with longer thinking
- Anthropic: Claude Opus 4.1, Sonnet 4 - Frontier intelligence models
- Google: Gemini 2.5 Pro/Flash - Advanced multi-modal capabilities
- Local: Community models for complete privacy
📝 Regular Dictation (for normal text):
- "This is just normal text I want transcribed"
- "Meeting notes: John mentioned the quarterly report"
- "Dear Sarah, thank you for your help"
The AI automatically detects when you're giving it commands versus dictating regular text, and removes agent name references from the final output.
- Local Processing:
- Install Whisper automatically through the Control Panel
- Download models: tiny (fastest), base (recommended), small, medium, large (best quality)
- Complete privacy - audio never leaves your device
- Cloud Processing:
- Requires OpenAI API key
- Faster processing
- Uses OpenAI's Whisper API
open-whispr/
├── main.js # Electron main process & IPC handlers
├── preload.js # Electron preload script & API bridge
├── whisper_bridge.py # Python script for local Whisper processing
├── setup.js # First-time setup script
├── package.json # Dependencies and scripts
├── env.example # Environment variables template
├── CHANGELOG.md # Project changelog
├── src/
│ ├── App.jsx # Main dictation interface
│ ├── main.jsx # React entry point
│ ├── index.html # Vite HTML template
│ ├── index.css # Tailwind CSS v4 configuration
│ ├── vite.config.js # Vite configuration
│ ├── components/
│ │ ├── ControlPanel.tsx # Settings and history UI
│ │ ├── OnboardingFlow.tsx # First-time setup wizard
│ │ ├── SettingsPage.tsx # Settings interface
│ │ ├── ui/ # shadcn/ui components
│ │ │ ├── button.tsx
│ │ │ ├── card.tsx
│ │ │ ├── input.tsx
│ │ │ ├── LoadingDots.tsx
│ │ │ ├── DotFlashing.tsx
│ │ │ ├── Toast.tsx
│ │ │ ├── toggle.tsx
│ │ │ └── tooltip.tsx
│ │ └── lib/
│ │ └── utils.ts # Utility functions
│ ├── services/
│ │ └── ReasoningService.ts # Multi-provider AI processing (OpenAI/Anthropic/Gemini)
│ ├── utils/
│ │ └── agentName.ts # Agent name management utility
│ └── components.json # shadcn/ui configuration
└── assets/ # App icons and resources
- Frontend: React 19, TypeScript, Tailwind CSS v4
- Build Tool: Vite with optimized Tailwind plugin
- Desktop: Electron 36 with context isolation
- UI Components: shadcn/ui with Radix primitives
- Database: better-sqlite3 for local transcription storage
- Speech-to-Text: OpenAI Whisper (local models + API)
- Local Processing: Python with OpenAI Whisper package
- Icons: Lucide React for consistent iconography
npm run dev- Start development with hot reloadnpm run start- Start production buildnpm run setup- First-time setup (creates .env file)npm run build:renderer- Build the React app onlynpm run build- Full build with signing (requires certificates)npm run build:mac- macOS build with signingnpm run build:win- Windows build with signingnpm run build:linux- Linux buildnpm run pack- Build without signing (for personal use)npm run dist- Build and package with signingnpm run lint- Run ESLintnpm run preview- Preview production build
The app consists of two main windows:
- Main Window: Minimal overlay for dictation controls
- Control Panel: Full settings and history interface
Both use the same React codebase but render different components based on URL parameters.
- main.js: Electron main process, IPC handlers, database operations
- preload.js: Secure bridge between main and renderer processes
- App.jsx: Main dictation interface with recording controls
- ControlPanel.tsx: Settings, history, and model management
- whisper_bridge.py: Python bridge for local Whisper processing
- better-sqlite3: Local database for transcription history
This project uses the latest Tailwind CSS v4 with:
- CSS-first configuration using
@themedirective - Vite plugin for optimal performance
- Custom design tokens for consistent theming
- Dark mode support with
@variant
The build process creates a single executable for your platform:
# Development build
npm run pack
# Production builds
npm run dist # Current platform
npm run build:mac # macOS DMG + ZIP
npm run build:win # Windows NSIS + Portable
npm run build:linux # AppImage + DEBCreate a .env file in the root directory (or use npm run setup):
# OpenAI API Configuration (optional - only needed for cloud processing)
OPENAI_API_KEY=your_openai_api_key_here
# Optional: Customize the Whisper model
WHISPER_MODEL=whisper-1
# Optional: Set language for better transcription accuracy
LANGUAGE=
# Optional: Anthropic API Configuration
ANTHROPIC_API_KEY=your_anthropic_api_key_here
# Optional: Google Gemini API Configuration
GEMINI_API_KEY=your_gemini_api_key_here
# Optional: Debug mode
DEBUG=falseFor local processing, OpenWhispr offers automated setup:
-
Automatic Python Installation (if needed):
- The app will detect if Python is missing
- Offers to install Python 3.11 automatically
- macOS: Uses Homebrew if available, otherwise official installer
- Windows: Downloads and installs official Python
- Linux: Uses system package manager (apt, yum, or pacman)
-
Automatic Whisper Setup:
- Installs OpenAI Whisper package via pip
- Downloads your chosen model on first use
- Handles all transcription locally
Requirements:
- Sufficient disk space for models (39MB - 1.5GB depending on model)
- Admin/sudo access may be required for Python installation
- Hotkey: Change in the Control Panel (default: backtick `) - fully customizable
- Panel Position: Drag the dictation panel to any location on your screen`
- Processing Method: Choose local or cloud in Control Panel
- Whisper Model: Select quality vs speed in Control Panel
- UI Theme: Edit CSS variables in
src/index.css - Window Size: Adjust dimensions in
main.js - Database: Transcriptions stored in user data directory
We welcome contributions! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Run
npm run lintbefore committing - Follow the existing code style
- Update documentation as needed
- Test on your target platform before submitting
OpenWhispr is designed with privacy and security in mind:
- Local Processing Option: Keep your voice data completely private
- No Analytics: We don't collect any usage data or telemetry
- Open Source: All code is available for review
- Secure Storage: API keys are stored securely in your system's keychain/credential manager
- Minimal Permissions: Only requests necessary permissions (microphone, accessibility)
- Microphone permissions: Grant permissions in System Preferences/Settings
- Accessibility permissions (macOS): Required for automatic text pasting
- Go to System Settings → Privacy & Security → Accessibility
- Add OpenWhispr and enable the checkbox
- Use "Fix Permission Issues" in Control Panel if needed
- API key errors (cloud processing only): Ensure your OpenAI API key is valid and has credits
- Set key through Control Panel or .env file
- Check logs for "OpenAI API Key present: Yes/No"
- Local Whisper installation:
- Ensure Python 3.7+ is installed
- Use Control Panel to install Whisper automatically
- Check available disk space for models
- Global hotkey conflicts: Change the hotkey in the Control Panel - any key can be used
- Text not pasting: Check accessibility permissions and try manual paste with Cmd+V
- Panel position: If the panel appears off-screen, restart the app to reset position
- Check the Issues page
- Review the console logs for debugging information
- For local processing: Ensure Python and pip are working
- For cloud processing: Verify your OpenAI API key and billing status
- Check the Control Panel for system status and diagnostics
- Local Processing: Use "base" model for best balance of speed and accuracy
- Cloud Processing: Generally faster but requires internet connection
- Model Selection: tiny (fastest) → base (recommended) → small → medium → large (best quality)
- Permissions: Ensure all required permissions are granted for smooth operation
Q: Is OpenWhispr really free? A: Yes! OpenWhispr is open source and free to use. You only pay for OpenAI API usage if you choose cloud processing.
Q: Which processing method should I use? A: Use local processing for privacy and offline use. Use cloud processing for speed and convenience.
Q: Can I use this commercially? A: Yes! The MIT license allows commercial use.
Q: How do I change the hotkey? A: Open the Control Panel (right-click tray icon) and go to Settings. You can set any key as your hotkey.
Q: Is my data secure? A: With local processing, your audio never leaves your device. With cloud processing, audio is sent to OpenAI's servers (see their privacy policy).
Q: What languages are supported? A: OpenWhispr supports 58 languages including English, Spanish, French, German, Chinese, Japanese, and more. Set your preferred language in the .env file or use auto-detect.
OpenWhispr is actively maintained and ready for production use. Current version: 1.0.4
- ✅ Core functionality complete
- ✅ Cross-platform support
- ✅ Local and cloud processing
- ✅ Automatic Python/Whisper installation
- ✅ Agent naming system
- ✅ Draggable interface
- 🚧 Continuous improvements and bug fixes