Pocket TTS is a lightweight local TTS from Kyutai that runs on CPU. Offers ~200ms latency, voice cloning, and 8 built-in voices without requiring a GPU or external API.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.
Installation
uv add vision-agents[pocket]
Quick Start
from vision_agents.core import Agent, User
from vision_agents.plugins import pocket, gemini, deepgram, getstream
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Assistant", id="agent"),
instructions="You are a helpful assistant.",
llm=gemini.LLM("gemini-2.5-flash"),
stt=deepgram.STT(),
tts=pocket.TTS(),
)
Pocket TTS runs locally. No API key required.
Parameters
| Name | Type | Default | Description |
|---|
voice | str | "alba" | Built-in voice or path to wav file for cloning |
Built-in Voices
alba, marius, javert, jean, fantine, cosette, eponine, azelma
Voice Cloning
# Use a local wav file
tts = pocket.TTS(voice="path/to/your/voice.wav")
# Or a HuggingFace-hosted voice
tts = pocket.TTS(voice="hf://kyutai/tts-voices/alba-mackenna/casual.wav")
Next Steps