VoiceMode - Voice Interface for Claude Code

Talk to Claude Code. Have it talk back. Natural voice interaction for your AI coding assistant.

VoiceMode brings bidirectional voice capabilities to Claude Code - speak your requests, hear the responses. No API keys required.

Why VoiceMode?

Hands-free coding - Dictate code changes while reviewing on another screen
Accessibility - Voice interface for developers who prefer or need it
Natural interaction - More conversational than typing
Zero cost - Uses Microsoft Edge TTS (free, no API keys)
Privacy - Local Whisper STT, nothing sent to external services

Quick Start

1. Install

git clone https://github.com/marc-shade/voicemode.git
cd voicemode
pip install -r requirements.txt

# Linux: Install audio tools
sudo apt install mpg123 ffmpeg alsa-utils  # Debian/Ubuntu
sudo dnf install mpg123 ffmpeg alsa-utils  # Fedora

2. Configure Claude Code

Add to ~/.claude.json:

{
  "mcpServers": {
    "voice-mode": {
      "command": "python",
      "args": ["/path/to/voicemode/server.py"]
    }
  }
}

3. Use It

You: "Hey Claude, speak this message back to me"

Claude: *uses speak tool*
🔊 "Hey! I can talk now. This is pretty cool, right?"

You: "Listen to me for 5 seconds"

Claude: *uses listen tool, records your voice*
📝 Transcribed: "Add a function called get user preferences"

Features

Feature	Status	Notes
Text-to-Speech	✅ Ready	Edge TTS, 50+ voices
Speech-to-Text	✅ Ready	Whisper (local), GPU optional
Continuous listening	✅ Ready	Caps Lock toggle
Wayland support	✅ Ready	MCP tool toggle
Multiple languages	✅ Ready	EN, ES, FR, DE, etc.

MCP Tools

`speak` - Text to Speech

speak("Hello from Claude!", voice="en-US-AriaNeural", rate="+10%")

`listen` - Speech to Text (one-shot)

result = listen(duration=5, model="base")
# Returns: {"text": "what you said", "confidence": 0.95}

`toggle_stt` - Toggle continuous listening

toggle_stt(enable=True)  # Start listening
toggle_stt(enable=False) # Stop listening
toggle_stt()             # Toggle current state

`get_transcriptions` - Get recent speech

history = get_transcriptions(limit=10)
# Returns timestamped list of everything you've said

Voice Options

English Voices (Samples)

Voice	Accent	Character
`en-IE-EmilyNeural`	Irish	Warm, friendly (default)
`en-US-AriaNeural`	American	Natural, professional
`en-GB-SoniaNeural`	British	Clear, articulate
`en-AU-NatashaNeural`	Australian	Casual, approachable

Other Languages

Spanish: es-ES-AlvaroNeural
French: fr-FR-DeniseNeural
German: de-DE-KatjaNeural
And 50+ more via Edge TTS

List all available voices:

list_voices(language="en")  # English voices
list_voices(language="es")  # Spanish voices

Whisper Models

Model	Size	Speed	Accuracy	Best For
tiny	140MB	Fastest	Good	Quick commands
base	140MB	Fast	Better	General use (default)
small	460MB	Medium	High	Detailed dictation
medium	1.5GB	Slow	Very High	Noisy environments
large	2.9GB	Slowest	Best	Maximum accuracy

Continuous Listening Mode

Toggle with Caps Lock or the toggle_stt tool:

[Press Caps Lock]
🔊 *beep* "Voice mode ON"

"Add a function to validate email addresses"
📝 Transcribed: "Add a function to validate email addresses"

"Make it return a boolean"
📝 Transcribed: "Make it return a boolean"

[Press Caps Lock]
🔊 *beep* "Voice mode OFF"

Audio feedback:

ON: Higher pitch beep (800Hz)
OFF: Lower pitch beep (400Hz)

GPU Acceleration (Optional)

For faster STT, point to a remote Whisper API:

export GPU_STT_ENDPOINT="http://your-gpu-server:8080/transcribe"
export GPU_STT_ENABLED=true

This is optional - local CPU transcription works fine for most use cases.

Architecture

┌─────────────────┐    ┌─────────────────┐
│   Claude Code   │◄──►│  VoiceMode MCP  │
└─────────────────┘    └────────┬────────┘
                                │
                    ┌───────────┴───────────┐
                    ▼                       ▼
            ┌───────────────┐       ┌───────────────┐
            │   Edge TTS    │       │    Whisper    │
            │   (speak)     │       │   (listen)    │
            └───────┬───────┘       └───────┬───────┘
                    │                       │
                    ▼                       ▼
            ┌───────────────┐       ┌───────────────┐
            │    mpg123     │       │    arecord    │
            │   (playback)  │       │  (recording)  │
            └───────────────┘       └───────────────┘

Troubleshooting

No audio output?

# Test TTS directly
edge-tts --text "Hello" --write-media /tmp/test.mp3 && mpg123 /tmp/test.mp3

STT not working?

# Check Whisper
pip install pywhispercpp

# Test recording
arecord -d 3 -f cd /tmp/test.wav && aplay /tmp/test.wav

Caps Lock not detected? (Linux)

# Add to input group
sudo usermod -aG input $USER
# Then logout/login

Requirements

Core (TTS - works immediately):

Python 3.8+
edge-tts
mpg123 or ffplay

Optional (STT):

pywhispercpp
alsa-utils (Linux)
Microphone

Related Projects

enhanced-memory-mcp - Persistent memory for Claude
claude-flow - Multi-agent orchestration
agent-runtime-mcp - Task queues and goals

License

MIT License - Use freely in personal and commercial projects.

Voice-first AI coding. If this helps your workflow, consider giving it a star!

Name		Name	Last commit message	Last commit date
Latest commit History 1,072 Commits
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
server.py		server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoiceMode - Voice Interface for Claude Code

Why VoiceMode?

Quick Start

1. Install

2. Configure Claude Code

3. Use It

Features

MCP Tools

`speak` - Text to Speech

`listen` - Speech to Text (one-shot)

`toggle_stt` - Toggle continuous listening

`get_transcriptions` - Get recent speech

Voice Options

English Voices (Samples)

Other Languages

Whisper Models

Continuous Listening Mode

GPU Acceleration (Optional)

Architecture

Troubleshooting

No audio output?

STT not working?

Caps Lock not detected? (Linux)

Requirements

Related Projects

License

About

Uh oh!

Releases

Packages

Languages

marc-shade/voicemode

Folders and files

Latest commit

History

Repository files navigation

VoiceMode - Voice Interface for Claude Code

Why VoiceMode?

Quick Start

1. Install

2. Configure Claude Code

3. Use It

Features

MCP Tools

speak - Text to Speech

listen - Speech to Text (one-shot)

toggle_stt - Toggle continuous listening

get_transcriptions - Get recent speech

Voice Options

English Voices (Samples)

Other Languages

Whisper Models

Continuous Listening Mode

GPU Acceleration (Optional)

Architecture

Troubleshooting

No audio output?

STT not working?

Caps Lock not detected? (Linux)

Requirements

Related Projects

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`speak` - Text to Speech

`listen` - Speech to Text (one-shot)

`toggle_stt` - Toggle continuous listening

`get_transcriptions` - Get recent speech

Packages