Thanks to visit codestin.com
Credit goes to visionagents.ai

Skip to main content
Pocket TTS is a lightweight local TTS from Kyutai that runs on CPU. Offers ~200ms latency, voice cloning, and 8 built-in voices without requiring a GPU or external API.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Installation

uv add vision-agents[pocket]

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import pocket, gemini, deepgram, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=gemini.LLM("gemini-2.5-flash"),
    stt=deepgram.STT(),
    tts=pocket.TTS(),
)
Pocket TTS runs locally. No API key required.

Parameters

NameTypeDefaultDescription
voicestr"alba"Built-in voice or path to wav file for cloning

Built-in Voices

alba, marius, javert, jean, fantine, cosette, eponine, azelma

Voice Cloning

# Use a local wav file
tts = pocket.TTS(voice="path/to/your/voice.wav")

# Or a HuggingFace-hosted voice
tts = pocket.TTS(voice="hf://kyutai/tts-voices/alba-mackenna/casual.wav")

Next Steps