Talk to Claude Code. Have it talk back. Natural voice interaction for your AI coding assistant.
VoiceMode brings bidirectional voice capabilities to Claude Code - speak your requests, hear the responses. No API keys required.
- Hands-free coding - Dictate code changes while reviewing on another screen
- Accessibility - Voice interface for developers who prefer or need it
- Natural interaction - More conversational than typing
- Zero cost - Uses Microsoft Edge TTS (free, no API keys)
- Privacy - Local Whisper STT, nothing sent to external services
git clone https://github.com/marc-shade/voicemode.git
cd voicemode
pip install -r requirements.txt
# Linux: Install audio tools
sudo apt install mpg123 ffmpeg alsa-utils # Debian/Ubuntu
sudo dnf install mpg123 ffmpeg alsa-utils # FedoraAdd to ~/.claude.json:
{
"mcpServers": {
"voice-mode": {
"command": "python",
"args": ["/path/to/voicemode/server.py"]
}
}
}You: "Hey Claude, speak this message back to me"
Claude: *uses speak tool*
π "Hey! I can talk now. This is pretty cool, right?"
You: "Listen to me for 5 seconds"
Claude: *uses listen tool, records your voice*
π Transcribed: "Add a function called get user preferences"
| Feature | Status | Notes |
|---|---|---|
| Text-to-Speech | β Ready | Edge TTS, 50+ voices |
| Speech-to-Text | β Ready | Whisper (local), GPU optional |
| Continuous listening | β Ready | Caps Lock toggle |
| Wayland support | β Ready | MCP tool toggle |
| Multiple languages | β Ready | EN, ES, FR, DE, etc. |
speak("Hello from Claude!", voice="en-US-AriaNeural", rate="+10%")result = listen(duration=5, model="base")
# Returns: {"text": "what you said", "confidence": 0.95}toggle_stt(enable=True) # Start listening
toggle_stt(enable=False) # Stop listening
toggle_stt() # Toggle current statehistory = get_transcriptions(limit=10)
# Returns timestamped list of everything you've said| Voice | Accent | Character |
|---|---|---|
en-IE-EmilyNeural |
Irish | Warm, friendly (default) |
en-US-AriaNeural |
American | Natural, professional |
en-GB-SoniaNeural |
British | Clear, articulate |
en-AU-NatashaNeural |
Australian | Casual, approachable |
- Spanish:
es-ES-AlvaroNeural - French:
fr-FR-DeniseNeural - German:
de-DE-KatjaNeural - And 50+ more via Edge TTS
List all available voices:
list_voices(language="en") # English voices
list_voices(language="es") # Spanish voices| Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
| tiny | 140MB | Fastest | Good | Quick commands |
| base | 140MB | Fast | Better | General use (default) |
| small | 460MB | Medium | High | Detailed dictation |
| medium | 1.5GB | Slow | Very High | Noisy environments |
| large | 2.9GB | Slowest | Best | Maximum accuracy |
Toggle with Caps Lock or the toggle_stt tool:
[Press Caps Lock]
π *beep* "Voice mode ON"
"Add a function to validate email addresses"
π Transcribed: "Add a function to validate email addresses"
"Make it return a boolean"
π Transcribed: "Make it return a boolean"
[Press Caps Lock]
π *beep* "Voice mode OFF"
Audio feedback:
- ON: Higher pitch beep (800Hz)
- OFF: Lower pitch beep (400Hz)
For faster STT, point to a remote Whisper API:
export GPU_STT_ENDPOINT="http://your-gpu-server:8080/transcribe"
export GPU_STT_ENABLED=trueThis is optional - local CPU transcription works fine for most use cases.
βββββββββββββββββββ βββββββββββββββββββ
β Claude Code βββββΊβ VoiceMode MCP β
βββββββββββββββββββ ββββββββββ¬βββββββββ
β
βββββββββββββ΄ββββββββββββ
βΌ βΌ
βββββββββββββββββ βββββββββββββββββ
β Edge TTS β β Whisper β
β (speak) β β (listen) β
βββββββββ¬ββββββββ βββββββββ¬ββββββββ
β β
βΌ βΌ
βββββββββββββββββ βββββββββββββββββ
β mpg123 β β arecord β
β (playback) β β (recording) β
βββββββββββββββββ βββββββββββββββββ
# Test TTS directly
edge-tts --text "Hello" --write-media /tmp/test.mp3 && mpg123 /tmp/test.mp3# Check Whisper
pip install pywhispercpp
# Test recording
arecord -d 3 -f cd /tmp/test.wav && aplay /tmp/test.wav# Add to input group
sudo usermod -aG input $USER
# Then logout/loginCore (TTS - works immediately):
- Python 3.8+
- edge-tts
- mpg123 or ffplay
Optional (STT):
- pywhispercpp
- alsa-utils (Linux)
- Microphone
- enhanced-memory-mcp - Persistent memory for Claude
- claude-flow - Multi-agent orchestration
- agent-runtime-mcp - Task queues and goals
MIT License - Use freely in personal and commercial projects.
Voice-first AI coding. If this helps your workflow, consider giving it a star!