Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Gemini Live API support#567

Open
imrenagi wants to merge 44 commits intogoogle:mainfrom
imrenagi:feat/live
Open

Gemini Live API support#567
imrenagi wants to merge 44 commits intogoogle:mainfrom
imrenagi:feat/live

Conversation

@imrenagi
Copy link
Contributor

Hi, i decided to give a try in implementing initial implementation for gemini live api support #550 in adk-go. This is a work in progress PR. I hope that I can get initial feedback from the community about few fundamental things like design, code changes, etc. I tried to see how it was done on adk-python and implement it in go.

Some highlights:

  • Live Mode for Agents: Introduced a new 'live mode' capability for agents, enabling real-time, bidirectional streaming interactions with LLMs. This includes new RunLive methods in the Agent interface and llmagent.
  • Real-time Communication Infrastructure: Added a LiveRequestQueue to manage real-time inputs (like audio and video) and extended InvocationContext to support this queue, facilitating continuous interaction with live models.
  • Bidirectional Streaming Configuration: A new StreamingModeBidi has been added to agent.RunConfig, along with fields for ResponseModalities, SpeechConfig, and audio transcription settings, to configure live connections.
  • Live Example Application: Included a new example application (examples/live) that demonstrates the full live streaming workflow, featuring a WebSocket server, client-side audio/video handling, and a chat UI.
  • Gemini Live Model Integration: Implemented the Connect method and LiveConnection interface for the Gemini model, allowing it to establish and manage live bidirectional streams with the LLM.

Notes: I havent really clean up some unnecessary changes like logger, docker compose, etc. Will move it later when finalizing this MR. Thanks!

… camera integration

- Implemented index.html for the demo interface with message input, connection status, and event console.
- Developed app.js for WebSocket connection management, message handling, and UI updates.
- Created audio-player.js and audio-recorder.js for audio playback and recording using Audio Worklets.
- Added PCM player and recorder processors to handle audio data efficiently.
- Integrated camera functionality for capturing images and sending them through the WebSocket.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @imrenagi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces foundational support for the Gemini Live API, enabling real-time, bidirectional streaming interactions with large language models within the ADK-Go framework. It establishes a new 'live mode' for agents, complete with a dedicated request queue for handling continuous input streams like audio and video. The changes extend core agent interfaces and configurations to support these new streaming capabilities, and a comprehensive example application is provided to showcase the end-to-end live streaming workflow, including a WebSocket server and client-side media handling.

Highlights

  • Live Mode for Agents: Introduced a new 'live mode' capability for agents, enabling real-time, bidirectional streaming interactions with LLMs. This includes new RunLive methods in the Agent interface and llmagent.
  • Real-time Communication Infrastructure: Added a LiveRequestQueue to manage real-time inputs (like audio and video) and extended InvocationContext to support this queue, facilitating continuous interaction with live models.
  • Bidirectional Streaming Configuration: A new StreamingModeBidi has been added to agent.RunConfig, along with fields for ResponseModalities, SpeechConfig, and audio transcription settings, to configure live connections.
  • Live Example Application: Included a new example application (examples/live) that demonstrates the full live streaming workflow, featuring a WebSocket server, client-side audio/video handling, and a chat UI.
  • Gemini Live Model Integration: Implemented the Connect method and LiveConnection interface for the Gemini model, allowing it to establish and manage live bidirectional streams with the LLM.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • .gitignore
    • Added '.env' to the ignore list.
  • agent/agent.go
    • Added RunLive method to the Agent interface and the agent struct.
    • Updated Config to include a RunLive function for defining live agent behavior.
  • agent/context.go
    • Added LiveRequestQueue to the InvocationContext interface and its implementation.
  • agent/live_request_queue.go
    • Added a new file defining LiveRequestQueue for managing live requests, including methods for sending content, realtime input, and tool responses.
  • agent/llmagent/llmagent.go
    • Added LiveConnectConfig to the llmagent.Config and llminternal.State.
    • Implemented the runLive method within llmAgent to handle live invocation contexts.
  • agent/run_config.go
    • Added StreamingModeBidi to StreamingMode enum.
    • Introduced new fields ResponseModalities, SpeechConfig, InputAudioTranscription, and OutputAudioTranscription to RunConfig for live streaming settings.
  • examples/live/Makefile
    • Added a Makefile with Docker Compose commands for up and down.
  • examples/live/docker-compose-arm.yml
    • Added a Docker Compose override file for ARM platforms, specifying platform: linux/x86_64 for PostgreSQL.
  • examples/live/docker-compose.yml
    • Added a Docker Compose file to set up a PostgreSQL service for the live example.
  • examples/live/main.go
    • Added the main application file for the live streaming example, including WebSocket server setup, agent initialization, and handling of client messages (text, audio, image).
  • examples/live/models/event.go
    • Added data models EventActions and Event for session events, along with mapping functions ToSessionEvent and FromSessionEvent.
  • examples/live/static/css/style.css
    • Added comprehensive CSS styling for the ADK Bidi-streaming demo UI, including chat bubbles, console, and modal elements.
  • examples/live/static/index.html
    • Added the HTML structure for the live streaming demo, featuring chat interface, event console, and a camera preview modal.
  • examples/live/static/js/app.js
    • Added JavaScript logic for WebSocket communication, UI updates, console logging, camera access, image capture, and audio handling for the live example.
  • examples/live/static/js/audio-player.js
    • Added JavaScript to start the Web Audio API player worklet for audio output.
  • examples/live/static/js/audio-recorder.js
    • Added JavaScript to start the Web Audio API recorder worklet for audio input and convert Float32 samples to 16-bit PCM.
  • examples/live/static/js/pcm-player-processor.js
    • Added an AudioWorkletProcessor for buffering and playing PCM audio data received from the main thread.
  • examples/live/static/js/pcm-recorder-processor.js
    • Added an AudioWorkletProcessor for processing and sending microphone input as PCM data to the main thread.
  • go.mod
    • Updated Go module dependencies, adding github.com/gorilla/mux, github.com/gorilla/websocket, gorm.io/driver/postgres, and github.com/rs/zerolog.
  • go.sum
    • Updated Go module checksums to reflect new and updated dependencies.
  • internal/agent/runconfig/run_config.go
    • Added LiveConnectConfig to the internal RunConfig struct.
  • internal/context/invocation_context.go
    • Added LiveRequestQueue field to InvocationContextParams and InvocationContext.
  • internal/llminternal/agent.go
    • Added LiveConnectConfig to the State struct within llminternal.
  • internal/llminternal/base_flow_live.go
    • Added a new file implementing the RunLive method for the LLM internal flow, handling live model connection, sending requests, and receiving responses.
  • internal/llminternal/basic_processor.go
    • Commented out placeholder code related to LiveConnectConfig and output schema checks in basicRequestProcessor.
  • internal/utils/utils.go
    • Modified AppendInstructions to also apply system instructions to LiveConnectConfig if present.
  • model/gemini/gemini_live.go
    • Added a new file implementing the Connect method for the Gemini model to establish live connections.
    • Defined liveConnection struct and its Send, Receive, and Close methods for managing bidirectional streams.
  • model/llm.go
    • Added a Connect method to the LLM interface for live streaming.
    • Introduced LiveConnectConfig to LLMRequest.
    • Defined new types LiveRequest and LiveConnection for handling live streaming interactions.
  • runner/runner.go
    • Added a RunLive method to the Runner struct to execute agents in live mode.
    • Implemented shouldAppendEvent and isLiveModelAudioEventWithInlineData helper functions to control event persistence for live calls.
  • tool/agenttool/agent_tool.go
    • Modified ProcessRequest to include agent tool declarations in LiveConnectConfig for live streaming scenarios.
Activity
  • The author, imrenagi, has initiated this pull request as a work-in-progress to implement initial support for the Gemini Live API in adk-go.
  • The author is seeking initial feedback from the community on design and code changes, drawing inspiration from the adk-python implementation.
  • The author noted that some unnecessary changes (like logger, docker compose) are present and will be cleaned up later.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces initial support for the Gemini Live API, enabling real-time, bidirectional streaming interactions. The changes are extensive, adding new RunLive methods to agents, a LiveRequestQueue for managing real-time inputs, and a comprehensive example application to demonstrate the new capabilities. The implementation looks promising and follows existing patterns in the codebase. My review focuses on potential issues such as a possible panic, areas for refactoring to reduce code duplication, and suggestions for improving API clarity and logging consistency.

…on options; update live connection handling for grounding metadata
…Context with new caching and session resumption methods
…on features; update InvocationContext and Runner for improved state management
…ve connection and add transcription accumulation logic.
…or live flows, and simplify live connection error handling.
…runner's event handling, and enhance live example with zerolog, artifact service, and transcription fields.
…ue`, and clarify `Channel` usage for timeout handling.
…ead of `fmt.Printf` for live connection handling.
…ToRun` signature, and pass session directly to invocation context.
@imrenagi
Copy link
Contributor Author

hi @dpasiukevich , sorry for tagging you because im not sure who to tag. but, I would like to get your opinion and hopefully more person (please help to tag relevant team or person) on this PR. It is currently working for basic gemini live features. The missing things that is still working in progress are:

  • tool calling and tool response
  • agent transfer

Thanks!

@imrenagi imrenagi changed the title WIP: Gemini Live API support Gemini Live API support Feb 20, 2026
…ess` goroutine return an error channel and merging it in `Receive`.
… a centralized results channel, and unified lifecycle management to improve concurrency safety, error propagation, and resource cleanup.
…nal handling and invocation context management for resumability.
…sages when resumability is enabled and filter empty LLM responses containing only GoAway.
…fine live session resumption update processing.
@dangkaka
Copy link

hi, any updates? i also need live api support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants