Thanks to visit codestin.com
Credit goes to github.com

Skip to content

VoiceMode MCP brings natural conversations to Claude Code

Notifications You must be signed in to change notification settings

marc-shade/voicemode

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1,072 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

VoiceMode - Voice Interface for Claude Code

MCP Python License Claude Code

Talk to Claude Code. Have it talk back. Natural voice interaction for your AI coding assistant.

VoiceMode brings bidirectional voice capabilities to Claude Code - speak your requests, hear the responses. No API keys required.

Why VoiceMode?

  • Hands-free coding - Dictate code changes while reviewing on another screen
  • Accessibility - Voice interface for developers who prefer or need it
  • Natural interaction - More conversational than typing
  • Zero cost - Uses Microsoft Edge TTS (free, no API keys)
  • Privacy - Local Whisper STT, nothing sent to external services

Quick Start

1. Install

git clone https://github.com/marc-shade/voicemode.git
cd voicemode
pip install -r requirements.txt

# Linux: Install audio tools
sudo apt install mpg123 ffmpeg alsa-utils  # Debian/Ubuntu
sudo dnf install mpg123 ffmpeg alsa-utils  # Fedora

2. Configure Claude Code

Add to ~/.claude.json:

{
  "mcpServers": {
    "voice-mode": {
      "command": "python",
      "args": ["/path/to/voicemode/server.py"]
    }
  }
}

3. Use It

You: "Hey Claude, speak this message back to me"

Claude: *uses speak tool*
πŸ”Š "Hey! I can talk now. This is pretty cool, right?"

You: "Listen to me for 5 seconds"

Claude: *uses listen tool, records your voice*
πŸ“ Transcribed: "Add a function called get user preferences"

Features

Feature Status Notes
Text-to-Speech βœ… Ready Edge TTS, 50+ voices
Speech-to-Text βœ… Ready Whisper (local), GPU optional
Continuous listening βœ… Ready Caps Lock toggle
Wayland support βœ… Ready MCP tool toggle
Multiple languages βœ… Ready EN, ES, FR, DE, etc.

MCP Tools

speak - Text to Speech

speak("Hello from Claude!", voice="en-US-AriaNeural", rate="+10%")

listen - Speech to Text (one-shot)

result = listen(duration=5, model="base")
# Returns: {"text": "what you said", "confidence": 0.95}

toggle_stt - Toggle continuous listening

toggle_stt(enable=True)  # Start listening
toggle_stt(enable=False) # Stop listening
toggle_stt()             # Toggle current state

get_transcriptions - Get recent speech

history = get_transcriptions(limit=10)
# Returns timestamped list of everything you've said

Voice Options

English Voices (Samples)

Voice Accent Character
en-IE-EmilyNeural Irish Warm, friendly (default)
en-US-AriaNeural American Natural, professional
en-GB-SoniaNeural British Clear, articulate
en-AU-NatashaNeural Australian Casual, approachable

Other Languages

  • Spanish: es-ES-AlvaroNeural
  • French: fr-FR-DeniseNeural
  • German: de-DE-KatjaNeural
  • And 50+ more via Edge TTS

List all available voices:

list_voices(language="en")  # English voices
list_voices(language="es")  # Spanish voices

Whisper Models

Model Size Speed Accuracy Best For
tiny 140MB Fastest Good Quick commands
base 140MB Fast Better General use (default)
small 460MB Medium High Detailed dictation
medium 1.5GB Slow Very High Noisy environments
large 2.9GB Slowest Best Maximum accuracy

Continuous Listening Mode

Toggle with Caps Lock or the toggle_stt tool:

[Press Caps Lock]
πŸ”Š *beep* "Voice mode ON"

"Add a function to validate email addresses"
πŸ“ Transcribed: "Add a function to validate email addresses"

"Make it return a boolean"
πŸ“ Transcribed: "Make it return a boolean"

[Press Caps Lock]
πŸ”Š *beep* "Voice mode OFF"

Audio feedback:

  • ON: Higher pitch beep (800Hz)
  • OFF: Lower pitch beep (400Hz)

GPU Acceleration (Optional)

For faster STT, point to a remote Whisper API:

export GPU_STT_ENDPOINT="http://your-gpu-server:8080/transcribe"
export GPU_STT_ENABLED=true

This is optional - local CPU transcription works fine for most use cases.

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Claude Code   │◄──►│  VoiceMode MCP  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β–Ό                       β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚   Edge TTS    β”‚       β”‚    Whisper    β”‚
            β”‚   (speak)     β”‚       β”‚   (listen)    β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚                       β”‚
                    β–Ό                       β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚    mpg123     β”‚       β”‚    arecord    β”‚
            β”‚   (playback)  β”‚       β”‚  (recording)  β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Troubleshooting

No audio output?

# Test TTS directly
edge-tts --text "Hello" --write-media /tmp/test.mp3 && mpg123 /tmp/test.mp3

STT not working?

# Check Whisper
pip install pywhispercpp

# Test recording
arecord -d 3 -f cd /tmp/test.wav && aplay /tmp/test.wav

Caps Lock not detected? (Linux)

# Add to input group
sudo usermod -aG input $USER
# Then logout/login

Requirements

Core (TTS - works immediately):

  • Python 3.8+
  • edge-tts
  • mpg123 or ffplay

Optional (STT):

  • pywhispercpp
  • alsa-utils (Linux)
  • Microphone

Related Projects

License

MIT License - Use freely in personal and commercial projects.


Voice-first AI coding. If this helps your workflow, consider giving it a star!

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%