TTSFM - Text-to-Speech API Client

Language / 语言: English | 中文

Star History

Overview

TTSFM is a free, OpenAI-compatible text-to-speech API service that provides a complete solution for converting text to natural-sounding speech based on OpenAI's GPT-4o mini TTS. Built on top of the openai.fm backend, it offers a powerful Python SDK, RESTful API endpoints, and an intuitive web playground for easy testing and integration.

What TTSFM Can Do:

🎤 Multiple Voices: Choose from 6 high-quality voices (alloy, echo, fable, onyx, nova, shimmer)
🎵 Flexible Audio Formats: Support for 6 audio formats (MP3, WAV, OPUS, AAC, FLAC, PCM)
⚡ Speed Control: Adjust playback speed from 0.25x to 4.0x for different use cases
📝 Long Text Support: Automatic text splitting and audio combining for content of any length
🔄 Real-time Streaming: WebSocket support for streaming audio generation
🐍 Python SDK: Easy-to-use synchronous and asynchronous clients
🌐 Web Playground: Interactive web interface for testing and experimentation
🐳 Docker Ready: Pre-built Docker images for instant deployment
🔍 Smart Detection: Automatic capability detection and helpful error messages
🤖 OpenAI Compatible: Drop-in replacement for OpenAI's TTS API

Key Features in v3.4.0:

🎯 Image variant detection (full vs slim Docker images)
🔍 Runtime capabilities API for feature availability checking
⚡ Speed adjustment with ffmpeg-based audio processing
🎵 Real format conversion for all 6 audio formats
📊 Enhanced error handling with clear, actionable messages
🐳 Dual Docker images optimized for different use cases

⚠️ Disclaimer: This project is intended for educational and research purposes only. It is a reverse-engineered implementation of the openai.fm service and should not be used for commercial purposes or in production environments. Users are responsible for ensuring compliance with applicable laws and terms of service.

Installation

Python package

pip install ttsfm        # core client
pip install ttsfm[web]   # client + Flask web app

Docker image

TTSFM offers two Docker image variants to suit different needs:

Full variant (recommended)

docker run -p 8000:8000 dbcccc/ttsfm:latest

Includes ffmpeg for advanced features:

✅ All 6 audio formats (MP3, WAV, OPUS, AAC, FLAC, PCM)
✅ Speed adjustment (0.25x - 4.0x)
✅ Format conversion with ffmpeg
✅ MP3 auto-combine for long text
✅ WAV auto-combine for long text

Slim variant - ~100MB

docker run -p 8000:8000 dbcccc/ttsfm:slim

Minimal image without ffmpeg:

✅ Basic TTS functionality
✅ 2 audio formats (MP3, WAV only)
✅ WAV auto-combine for long text
❌ No speed adjustment
❌ No format conversion
❌ No MP3 auto-combine

The container exposes the web playground at http://localhost:8000 and an OpenAI-compatible endpoint at /v1/audio/speech.

Check available features:

curl http://localhost:8000/api/capabilities

Quick start

Python client

from ttsfm import TTSClient, AudioFormat, Voice

client = TTSClient()

# Basic usage
response = client.generate_speech(
    text="Hello from TTSFM!",
    voice=Voice.ALLOY,
    response_format=AudioFormat.MP3,
)
response.save_to_file("hello")  # -> hello.mp3

# With speed adjustment (requires ffmpeg)
response = client.generate_speech(
    text="This will be faster!",
    voice=Voice.NOVA,
    response_format=AudioFormat.MP3,
    speed=1.5,  # 1.5x speed (0.25 - 4.0)
)
response.save_to_file("fast")  # -> fast.mp3

CLI

ttsfm "Hello, world" --voice nova --format mp3 --output hello.mp3

REST API (OpenAI-compatible)

# Basic request
curl -X POST http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello world!",
    "voice": "alloy",
    "response_format": "mp3"
  }' --output speech.mp3

# With speed adjustment (requires full image)
curl -X POST http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello world!",
    "voice": "alloy",
    "response_format": "mp3",
    "speed": 1.5
  }' --output speech_fast.mp3

Available voices: alloy, echo, fable, onyx, nova, shimmer Available formats: mp3, wav (always) + opus, aac, flac, pcm (full image only) Speed range: 0.25 - 4.0 (requires full image)

Learn more

Browse the full API reference and operational notes in the web documentation (or see ttsfm-web/templates/docs.html).
Read the architecture overview for component diagrams.
Contributions are welcome—see CONTRIBUTING.md for guidelines.

License

TTSFM is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
.github		.github
docs		docs
tests		tests
ttsfm-web		ttsfm-web
ttsfm		ttsfm
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TTSFM - Text-to-Speech API Client

Star History

Overview

Installation

Python package

Docker image

Full variant (recommended)

Slim variant - ~100MB

Quick start

Python client

CLI

REST API (OpenAI-compatible)

Learn more

License

About

Uh oh!

Releases 51

Packages

Uh oh!

Uh oh!

Contributors 9

Uh oh!

Languages

License

dbccccccc/ttsfm

Folders and files

Latest commit

History

Repository files navigation

TTSFM - Text-to-Speech API Client

Star History

Overview

Installation

Python package

Docker image

Full variant (recommended)

Slim variant - ~100MB

Quick start

Python client

CLI

REST API (OpenAI-compatible)

Learn more

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 51

Packages 0

Uh oh!

Uh oh!

Contributors 9

Uh oh!

Languages

Packages