Thanks to visit codestin.com
Credit goes to visionagents.ai

Skip to main content
OpenAI provides industry-leading language models and realtime speech capabilities. The plugin supports four modes: Realtime (WebRTC speech-to-speech), LLM (Responses API for GPT-5+), ChatCompletionsLLM (any OpenAI-compatible API), and TTS (text-to-speech).
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Installation

uv add vision-agents[openai]

Realtime

Native speech-to-speech over WebRTC with built-in STT/TTS.
from vision_agents.core import Agent, User
from vision_agents.plugins import openai, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful voice assistant.",
    llm=openai.Realtime(model="gpt-realtime", voice="marin", fps=1),
)
NameTypeDefaultDescription
modelstr"gpt-realtime"OpenAI realtime model
voicestr"marin"Voice (“marin”, “alloy”, “echo”, etc.)
fpsint1Video frames per second

LLM

Uses the Responses API (default for GPT-5+). Requires separate STT/TTS.
from vision_agents.plugins import openai, deepgram

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=openai.LLM(model="gpt-4o"),
    stt=deepgram.STT(),
    tts=openai.TTS(),
)
NameTypeDefaultDescription
modelstrModel (e.g., "gpt-4o", "gpt-5")
api_keystrNoneAPI key (defaults to OPENAI_API_KEY env var)
base_urlstrNoneCustom API endpoint

ChatCompletionsLLM

Works with any OpenAI-compatible API (Together AI, Fireworks, DeepSeek, etc.).
from vision_agents.plugins import openai

llm = openai.ChatCompletionsLLM(
    model="deepseek-chat",
    base_url="https://api.deepseek.com",
    api_key="your_api_key"
)

TTS

Streaming text-to-speech.
tts = openai.TTS(model="gpt-4o-mini-tts", voice="alloy")
NameTypeDefaultDescription
modelstr"gpt-4o-mini-tts"TTS model
voicestr"alloy"Voice (“alloy”, “echo”, “fable”, “onyx”, “nova”, “shimmer”)

Function Calling

@agent.llm.register_function(description="Get weather for a location")
async def get_weather(location: str) -> dict:
    return {"temperature": "72°F", "condition": "Sunny"}
See the Function Calling guide for details.

Events

The OpenAI plugin emits a low-level event for raw stream data. Most developers should use the core events (LLMResponseCompletedEvent, RealtimeUserSpeechTranscriptionEvent, etc.) instead.
from vision_agents.plugins.openai.events import OpenAIStreamEvent

@agent.events.subscribe
async def on_openai_stream(event: OpenAIStreamEvent):
    # Access raw OpenAI stream data
    print(f"Raw event: {event.event_type}, {event.event_data}")

Next Steps