Fish Audio provides high-quality STT and TTS with automatic language detection and voice cloning support. Ideal for multilingual applications.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.
Installation
uv add vision-agents[fish]
Quick Start
from vision_agents.core import Agent, User
from vision_agents.plugins import fish, gemini, getstream
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Assistant", id="agent"),
instructions="You are a helpful assistant.",
llm=gemini.LLM("gemini-2.5-flash"),
stt=fish.STT(),
tts=fish.TTS(),
)
Set FISH_API_KEY in your environment or pass api_key directly.
TTS
tts = fish.TTS(reference_id="your_voice_id") # Optional voice cloning
| Name | Type | Default | Description |
|---|
reference_id | str | None | Voice ID for voice cloning |
api_key | str | None | API key (defaults to FISH_API_KEY env var) |
STT
stt = fish.STT(language="en") # Or None for auto-detection
| Name | Type | Default | Description |
|---|
language | str | None | Language code ("en", "zh", etc.) or None for auto-detect |
api_key | str | None | API key (defaults to FISH_API_KEY env var) |
Next Steps