🎤 AI-powered voice assistant with MCP integration - A fork of Whispo that transforms your voice into intelligent actions with advanced speech recognition, LLM processing, and Model Context Protocol (MCP) tool execution.
Click here to see v1 launch video on youtube
speakmcp-vid.mp4
Platform Support: macOS (Apple Silicon & Intel) with full MCP agent functionality.
⚠️ Windows/Linux: MCP tools not currently supported — see v0.2.2 for dictation-only builds.
Voice Recording:
- Hold
Ctrl(macOS/Linux) orCtrl+/(Windows) to start recording - Release to stop recording and transcribe
- Text is automatically inserted into your active application
MCP Agent Mode (macOS only):
- Hold
Ctrl+Altto start recording for agent mode - Release
Ctrl+Altto process with MCP tools - Watch real-time progress as the agent executes tools
- Results are automatically inserted or displayed
Text Input:
Ctrl+T(macOS/Linux) orCtrl+Shift+T(Windows) for direct typing
| Category | Capabilities |
|---|---|
| 🎤 Voice | Hold-to-record, 30+ languages, Fn toggle mode, auto-insert to any app |
| 🔊 TTS | 50+ AI voices via OpenAI, Groq, and Gemini with auto-play |
| 🤖 MCP Agent | Tool execution, OAuth 2.1 auth, real-time progress, conversation context |
| 📊 Observability | Langfuse integration for LLM tracing, token usage, and debugging |
| 🛠️ Platform | macOS/Windows/Linux, rate limit handling, multi-provider AI |
| 🎨 UX | Dark/light themes, resizable panels, kill switch, conversation history |
git clone https://github.com/aj47/SpeakMCP.git && cd SpeakMCP
pnpm install && pnpm build-rs && pnpm devSee DEVELOPMENT.md for full setup, build commands, troubleshooting, and architecture details.
AI Providers — Configure in settings:
- OpenAI, Groq, or Google Gemini API keys
- Model selection per provider
- Custom base URLs (optional)
MCP Servers — Add tools in mcpServers JSON format:
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/path"]
}
}
}Keyboard Shortcuts:
| Shortcut | Action |
|---|---|
Hold Ctrl / Ctrl+/ (Win) |
Voice recording |
Fn |
Toggle dictation on/off |
Hold Ctrl+Alt |
MCP agent mode (macOS) |
Ctrl+T / Ctrl+Shift+T (Win) |
Text input |
Ctrl+Shift+Escape |
Kill switch |
We welcome contributions! Fork the repo, create a feature branch, and open a Pull Request.
💬 Get help on Discord | 🌐 More info at techfren.net
This project is licensed under the AGPL-3.0 License.
Built on Whispo • Powered by OpenAI, Anthropic, Groq, Google • MCP • Electron • React • Rust
Made with ❤️ by the SpeakMCP team