What You Can Build
Voice Agents
Customer support bots, phone assistants, and voice interfaces using OpenAI Realtime, Gemini, or STT + LLM + TTS pipelines.
Video AI
Sports coaching, surveillance, manufacturing workflows. Combine YOLO, Roboflow, or Moondream with Gemini or OpenAI vision.
Phone Integration
Inbound and outbound calling via Twilio. Build phone bots with RAG-powered knowledge bases.
Video Avatars
Real-time interactive avatars with HeyGen or video style transfer with Decart.
Examples
| Example | Description |
|---|---|
| Simple Voice Agent | Basic voice agent with OpenAI or Gemini Realtime |
| Golf Coach | YOLO pose detection + Gemini for real-time coaching |
| Phone + RAG | Twilio calling with TurboPuffer vector search |
| Security Camera | Face recognition, package detection, automated alerts |
Capabilities
- 25+ integrations — OpenAI, Gemini, Anthropic, Deepgram, ElevenLabs, YOLO, and more
- Two modes — Realtime APIs (WebRTC/WebSocket) or custom STT → LLM → TTS pipelines
- Video processing — Run YOLO, Roboflow, or custom models on every frame
- Phone support — Twilio integration for voice calls with bi-directional audio
- RAG — TurboPuffer vector search and Gemini FileSearch for knowledge retrieval
- Production ready — HTTP server, Prometheus metrics, Docker deployment with GPU support