Setup & Installation
What This Skill Does
Covers building real-time, bidirectional streaming apps with the Gemini Live API over WebSockets. Handles audio/video/text input streams, voice activity detection, session management, ephemeral tokens, and function calling. SDKs covered are google-genai (Python) and @google/genai (JavaScript/TypeScript).
Rather than piecing together WebSocket protocol details, VAD config, and session management from scattered docs, this skill gives you working patterns for the full Live API lifecycle in one place.
When to use it
- Building a voice assistant that responds in real time from microphone input
- Streaming live video frames alongside audio for multimodal AI conversations
- Adding session resumption to a Live API app after connection resets
- Generating ephemeral tokens for browser-based deployments to avoid exposing API keys
- Configuring voice activity detection to handle user interruptions mid-response