Thanks to visit codestin.com
Credit goes to github.com

Skip to content

aj47/SpeakMCP

 
 

Repository files navigation

SpeakMCP

🎤 AI-powered voice assistant with MCP integration - A fork of Whispo that transforms your voice into intelligent actions with advanced speech recognition, LLM processing, and Model Context Protocol (MCP) tool execution.

License: AGPL-3.0 Electron TypeScript React

🎬 Preview

Click here to see v1 launch video on youtube image

speakmcp-vid.mp4

🚀 Quick Start

Download

📥 Download Latest Release

Platform Support: macOS (Apple Silicon & Intel) with full MCP agent functionality. ⚠️ Windows/Linux: MCP tools not currently supported — see v0.2.2 for dictation-only builds.

Basic Usage

Voice Recording:

  1. Hold Ctrl (macOS/Linux) or Ctrl+/ (Windows) to start recording
  2. Release to stop recording and transcribe
  3. Text is automatically inserted into your active application

MCP Agent Mode (macOS only):

  1. Hold Ctrl+Alt to start recording for agent mode
  2. Release Ctrl+Alt to process with MCP tools
  3. Watch real-time progress as the agent executes tools
  4. Results are automatically inserted or displayed

Text Input:

  • Ctrl+T (macOS/Linux) or Ctrl+Shift+T (Windows) for direct typing

✨ Features

Category Capabilities
🎤 Voice Hold-to-record, 30+ languages, Fn toggle mode, auto-insert to any app
🔊 TTS 50+ AI voices via OpenAI, Groq, and Gemini with auto-play
🤖 MCP Agent Tool execution, OAuth 2.1 auth, real-time progress, conversation context
📊 Observability Langfuse integration for LLM tracing, token usage, and debugging
🛠️ Platform macOS/Windows/Linux, rate limit handling, multi-provider AI
🎨 UX Dark/light themes, resizable panels, kill switch, conversation history

🛠️ Development

git clone https://github.com/aj47/SpeakMCP.git && cd SpeakMCP
pnpm install && pnpm build-rs && pnpm dev

See DEVELOPMENT.md for full setup, build commands, troubleshooting, and architecture details.

⚙️ Configuration

AI Providers — Configure in settings:

  • OpenAI, Groq, or Google Gemini API keys
  • Model selection per provider
  • Custom base URLs (optional)

MCP Servers — Add tools in mcpServers JSON format:

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path"]
    }
  }
}

Keyboard Shortcuts:

Shortcut Action
Hold Ctrl / Ctrl+/ (Win) Voice recording
Fn Toggle dictation on/off
Hold Ctrl+Alt MCP agent mode (macOS)
Ctrl+T / Ctrl+Shift+T (Win) Text input
Ctrl+Shift+Escape Kill switch

🤝 Contributing

We welcome contributions! Fork the repo, create a feature branch, and open a Pull Request.

💬 Get help on Discord | 🌐 More info at techfren.net

📄 License

This project is licensed under the AGPL-3.0 License.

🙏 Acknowledgments

Built on Whispo • Powered by OpenAI, Anthropic, Groq, GoogleMCPElectronReactRust


Made with ❤️ by the SpeakMCP team

About

Spawn agents anywhere in one keypress

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • TypeScript 97.0%
  • JavaScript 0.8%
  • Rust 0.5%
  • CSS 0.5%
  • Shell 0.5%
  • PowerShell 0.4%
  • Other 0.3%