Deepgram provides fast, accurate speech-to-text and text-to-speech with built-in turn detection. Ideal for real-time conversational agents.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.
Installation
uv add vision-agents[deepgram]
Quick Start
from vision_agents.core import Agent, User
from vision_agents.plugins import deepgram, gemini, getstream
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Assistant", id="agent"),
instructions="You are a helpful assistant.",
llm=gemini.LLM("gemini-2.5-flash"),
stt=deepgram.STT(),
tts=deepgram.TTS(),
)
Set DEEPGRAM_API_KEY in your environment or pass api_key directly.
STT
Real-time transcription with built-in turn detection.
stt = deepgram.STT(
model="nova-3",
language="en",
eager_turn_detection=True,
)
| Name | Type | Default | Description |
|---|
model | str | "nova-3" | Deepgram model |
language | str | "en" | Language code |
eager_turn_detection | bool | False | Enable faster turn detection |
api_key | str | None | API key (defaults to DEEPGRAM_API_KEY env var) |
TTS
Low-latency text-to-speech synthesis.
tts = deepgram.TTS(
model="aura-2",
voice="aura-asteria-en",
)
| Name | Type | Default | Description |
|---|
model | str | "aura-2" | TTS model |
voice | str | "aura-asteria-en" | Voice ID (available voices) |
api_key | str | None | API key (defaults to DEEPGRAM_API_KEY env var) |
Next Steps