Thanks to visit codestin.com
Credit goes to visionagents.ai

Skip to main content
Fish Audio provides high-quality STT and TTS with automatic language detection and voice cloning support. Ideal for multilingual applications.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Installation

uv add vision-agents[fish]

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import fish, gemini, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=gemini.LLM("gemini-2.5-flash"),
    stt=fish.STT(),
    tts=fish.TTS(),
)
Set FISH_API_KEY in your environment or pass api_key directly.

TTS

tts = fish.TTS(reference_id="your_voice_id")  # Optional voice cloning
NameTypeDefaultDescription
reference_idstrNoneVoice ID for voice cloning
api_keystrNoneAPI key (defaults to FISH_API_KEY env var)

STT

stt = fish.STT(language="en")  # Or None for auto-detection
NameTypeDefaultDescription
languagestrNoneLanguage code ("en", "zh", etc.) or None for auto-detect
api_keystrNoneAPI key (defaults to FISH_API_KEY env var)

Next Steps