Codestin Search App

OpenAI provides industry-leading language models and realtime speech capabilities. The plugin supports four modes: Realtime (WebRTC speech-to-speech), LLM (Responses API for GPT-5+), ChatCompletionsLLM (any OpenAI-compatible API), and TTS (text-to-speech).

Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Installation

uv add vision-agents[openai]

Realtime

Native speech-to-speech over WebRTC with built-in STT/TTS.

from vision_agents.core import Agent, User
from vision_agents.plugins import openai, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful voice assistant.",
    llm=openai.Realtime(model="gpt-realtime", voice="marin", fps=1),
)

Name	Type	Default	Description
`model`	`str`	`"gpt-realtime"`	OpenAI realtime model
`voice`	`str`	`"marin"`	Voice (“marin”, “alloy”, “echo”, etc.)
`fps`	`int`	`1`	Video frames per second

LLM

Uses the Responses API (default for GPT-5+). Requires separate STT/TTS.

from vision_agents.plugins import openai, deepgram

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=openai.LLM(model="gpt-4o"),
    stt=deepgram.STT(),
    tts=openai.TTS(),
)

Name	Type	Default	Description
`model`	`str`	—	Model (e.g., `"gpt-4o"`, `"gpt-5"`)
`api_key`	`str`	`None`	API key (defaults to `OPENAI_API_KEY` env var)
`base_url`	`str`	`None`	Custom API endpoint

ChatCompletionsLLM

Works with any OpenAI-compatible API (Together AI, Fireworks, DeepSeek, etc.).

from vision_agents.plugins import openai

llm = openai.ChatCompletionsLLM(
    model="deepseek-chat",
    base_url="https://api.deepseek.com",
    api_key="your_api_key"
)

TTS

Streaming text-to-speech.

tts = openai.TTS(model="gpt-4o-mini-tts", voice="alloy")

Name	Type	Default	Description
`model`	`str`	`"gpt-4o-mini-tts"`	TTS model
`voice`	`str`	`"alloy"`	Voice (“alloy”, “echo”, “fable”, “onyx”, “nova”, “shimmer”)

Function Calling

@agent.llm.register_function(description="Get weather for a location")
async def get_weather(location: str) -> dict:
    return {"temperature": "72°F", "condition": "Sunny"}

See the Function Calling guide for details.

Events

The OpenAI plugin emits a low-level event for raw stream data. Most developers should use the core events (LLMResponseCompletedEvent, RealtimeUserSpeechTranscriptionEvent, etc.) instead.

from vision_agents.plugins.openai.events import OpenAIStreamEvent

@agent.events.subscribe
async def on_openai_stream(event: OpenAIStreamEvent):
    # Access raw OpenAI stream data
    print(f"Raw event: {event.event_type}, {event.event_data}")

Overview

AI Providers

Custom Integrations

OpenAI

Installation

Realtime

LLM

ChatCompletionsLLM

TTS

Function Calling

Events

Next Steps

Build a Voice Agent

Build a Video Agent

Overview

AI Providers

Custom Integrations

​Installation

​Realtime

​LLM

​ChatCompletionsLLM

​TTS

​Function Calling

​Events

​Next Steps

Build a Voice Agent

Build a Video Agent

Installation

Realtime

LLM

ChatCompletionsLLM

TTS

Function Calling

Events

Next Steps