pip install orpheus-cppYou also need to install the llama-cpp-python package separately. This is because llama-cpp-python does not ship pre-built wheels on PyPi.
Don't worry, you can just run one of the following commands:
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpupip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metalRun the standalone API server for OpenAI-compatible text-to-speech:
orpheus-apiOr using uv:
uv run orpheus-apiThe server will start on http://localhost:8992 and provides OpenAI-compatible endpoints.
from openai import OpenAI
# Point to your local Orpheus server
client = OpenAI(
api_key="dummy-key", # API key is ignored for local server
base_url="http://localhost:8992/v1"
)
# Generate speech
response = client.audio.speech.create(
model="orpheus-tts",
voice="tara", # Use Orpheus voice names
input="Hello! This is Orpheus TTS speaking.",
response_format="wav"
)
# Save audio file
with open("speech.wav", "wb") as f:
f.write(response.content)Orpheus supports multiple voices across different languages:
- English: tara, jess, leah, leo, dan, mia, zac, zoe
- Spanish: javi, sergio, maria
- French: pierre, amelie, marie
- German: jana, thomas, max
- Italian: pietro, giulia, carlo
- Chinese: 长乐, 白芷
- Korean: 유나, 준서
- Hindi: ऋतिका
POST /v1/audio/speech- Generate speech (OpenAI compatible)POST /v1/audio/speech/stream- Stream speech generationGET /v1/voices- List available voicesGET /v1/models- List available modelsGET /health- Health check
After installing orpheus-cpp, install fastrtc and run the following command:
python -m orpheus_cppThen go to http://localhost:7860 and you should see the demo.
2025-03-26_10-37-56.mp4
from orpheus_cpp import OrpheusCpp
from scipy.io.wavfile import write
orpheus = OrpheusCpp()
text = "I really hope the project deadline doesn't get moved up again."
# output is a tuple of (sample_rate (24_000), samples (numpy int16 array))
sample_rate, samples = orpheus.tts(text, options={"voice_id": "tara"})
write("output.wav", sample_rate, samples.squeeze())for sample_rate, samples in orpheus.stream_tts_sync(text, options={"voice_id": "tara"}):
write("output.wav", sample_rate, samples.squeeze())async for sample_rate, samples in orpheus.stream_tts(text, options={"voice_id": "tara"}):
write("output.wav", sample_rate, samples.squeeze())By default, we wait until 1.5 seconds of audio is generated before yielding the first chunk.
This is to ensure smooth audio streaming at the cost of a longer time to first audio.
Depending on your hardware, you can try to reduce the pre_buffer_size to get a faster time to first chunk.
async for sample_rate, samples in orpheus.stream_tts(text, options={"voice_id": "tara", "pre_buffer_size": 0.5}):
write("output.wav", sample_rate, samples.squeeze())orpheus-cpp is distributed under the terms of the MIT license.