Codestin Search App

Build low-latency voice and video AI agents using any model. Vision Agents is an open-source Python framework with 25+ integrations, production-ready deployment, and Stream’s global edge network for sub-500ms latency.

Get Started

Install and build your first agent

GitHub

Star the project and explore examples

What You Can Build

Voice Agents

Customer support bots, phone assistants, and voice interfaces using OpenAI Realtime, Gemini, or STT + LLM + TTS pipelines.

Video AI

Sports coaching, surveillance, manufacturing workflows. Combine YOLO, Roboflow, or Moondream with Gemini or OpenAI vision.

Phone Integration

Inbound and outbound calling via Twilio. Build phone bots with RAG-powered knowledge bases.

Video Avatars

Real-time interactive avatars with HeyGen or video style transfer with Decart.

Examples

Example	Description
Simple Voice Agent	Basic voice agent with OpenAI or Gemini Realtime
Golf Coach	YOLO pose detection + Gemini for real-time coaching
Phone + RAG	Twilio calling with TurboPuffer vector search
Security Camera	Face recognition, package detection, automated alerts

Capabilities

25+ integrations — OpenAI, Gemini, Anthropic, Deepgram, ElevenLabs, YOLO, and more
Two modes — Realtime APIs (WebRTC/WebSocket) or custom STT → LLM → TTS pipelines
Video processing — Run YOLO, Roboflow, or custom models on every frame
Phone support — Twilio integration for voice calls with bi-directional audio
RAG — TurboPuffer vector search and Gemini FileSearch for knowledge retrieval
Production ready — HTTP server, Prometheus metrics, Docker deployment with GPU support

Next Steps

Installation

Install the SDK and configure your providers

Integrations

Browse 25+ supported AI providers

Guides

Deploy to production with Docker and metrics

Try Stream Video

Get 333,000 free participant minutes

Getting Started

AI Technologies

Core Architecture

Reference

Vision Agents

Get Started

GitHub

What You Can Build

Voice Agents

Video AI

Phone Integration

Video Avatars

Examples

Capabilities

Next Steps

Installation

Integrations

Guides

Try Stream Video

Getting Started

AI Technologies

Core Architecture

Reference

Get Started

GitHub

​What You Can Build

Voice Agents

Video AI

Phone Integration

Video Avatars

​Examples

​Capabilities

​Next Steps

Installation

Integrations

Guides

Try Stream Video

What You Can Build

Examples

Capabilities

Next Steps