A comprehensive MCP (Model Context Protocol) server that provides text-to-speech generation and audio processing capabilities. This server integrates with Replicate's TTS models and FFmpeg for complete audio workflows.
This MCP server enables AI assistants to:
- Generate speech from text using multiple TTS models
- Process and manipulate audio files
- Extract audio metadata and information
- Convert between different audio formats
- Perform advanced audio editing operations
- Text-to-Speech: Generate speech using Chatterbox, Chatterbox Pro, and Minimax models
- Audio Processing: Trim, volume adjustment, format conversion, and concatenation
- Audio Analysis: Extract comprehensive metadata from audio files
- Multi-Model Support: Choose from different TTS models with specific parameters
- Audio Editing: Split audio files into segments with customizable durations
- TypeScript Support: Full type definitions and Zod schema validation
- Easy Installation: Scripts for different MCP clients
This MCP server provides:
- TTS Generation: Converts text to speech using Replicate's AI models
- Audio Processing: Comprehensive audio manipulation using FFmpeg
- File Management: Automatic audio directory creation and file handling
- Multi-Model Support: Configurable parameters for different TTS models
- Error Handling: Robust error handling with detailed feedback
The server includes 7 main tools for complete audio workflows from text input to final processed audio files.
- Node.js (v18 or higher)
- FFmpeg installed on your system
- Replicate API token (sign up at replicate.com)
# Clone the repository
git clone <your-repo-url>
cd mcp-speech-server
# Install dependencies
pnpm install
# Set up environment variables
cp .env.example .env.local
# Edit .env.local and add your REPLICATE_API_TOKEN
# Build the project
pnpm run build
# Start the server
pnpm startInstall the server to different MCP clients:
# For Claude Desktop
pnpm run install-desktop
# For Cursor
pnpm run install-cursor
# For Claude Code
pnpm run install-code
# For all MCP clients
pnpm run install-serverThese scripts will build the project and automatically update the appropriate configuration files.
The installation script will automatically add the configuration, but you can also manually add it to your claude_desktop_config.json file:
{
"mcpServers": {
"speech-tts": {
"command": "node",
"args": ["/path/to/your/dist/index.js"],
"env": {
"REPLICATE_API_TOKEN": "your-token-here"
}
}
}
}Then restart Claude Desktop to connect to the server.
- text-to-speech: Generate speech from text using Chatterbox, Chatterbox Pro, or Minimax models
- Supports voice selection, emotion control, and audio quality settings
- Automatically saves generated audio to the
audio/directory
- get-audio-metadata: Extract comprehensive metadata from audio files
- trim-audio: Trim audio files to specified start/end times or duration
- adjust-audio-volume: Adjust audio volume with multiplier controls
- convert-audio-format: Convert between different audio formats (mp3, wav, flac, ogg)
- concatenate-audio: Combine multiple audio files into one
- split-audio: Split audio files into segments of specified duration
# Generate speech using Chatterbox model
text-to-speech "Hello world!" --model chatterbox --voice Luna
# Extract audio metadata
get-audio-metadata audio/speech.wav
# Trim audio from 30 seconds to 2 minutes
trim-audio audio/input.wav audio/output.wav --start_time 30 --duration 90
# Convert audio format
convert-audio-format audio/input.wav audio/output.mp3 --format mp3 --bitrate 192k├── src/
│ └── index.ts # Main server implementation with all tools
├── scripts/ # Installation and utility scripts
├── dist/ # Compiled JavaScript (generated)
├── audio/ # Generated audio files (created automatically)
├── package.json # Project configuration
├── tsconfig.json # TypeScript configuration
├── .env.local # Environment variables (create from .env.example)
└── README.md # This file
- Make changes to
src/index.ts - Run
pnpm run buildto compile - Test your server with
pnpm start - Use the installation scripts to update your MCP client configuration
- Voices: Luna, Ember, Hem, Aurora, Cliff, Josh, William, Orion, Ken
- Parameters: temperature, cfg_weight, exaggeration, audio_prompt
- Use case: General-purpose TTS with voice cloning support
- Voices: Luna and custom voice UUIDs
- Parameters: pitch, temperature, exaggeration, custom_voice
- Use case: Professional-grade TTS with custom voices
- Voices: Deep_Voice_Man, Wise_Woman, and others
- Parameters: speed, volume, emotion, sample_rate, bitrate
- Use case: Multilingual TTS with emotion control
@modelcontextprotocol/sdk- MCP protocol implementationreplicate- AI model inference platformfluent-ffmpeg- Audio processing and manipulationaxios- HTTP client for API callszod- Runtime type validation
MIT