fix: mark cancellation events as Final in Executor#494
Conversation
Summary of ChangesHello @nahapetyan-serob, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical deadlock issue in the task cancellation mechanism. By ensuring that cancellation events are correctly marked as final, the system can now properly close event streams and terminate tasks, which was previously hindered by an open event stream. The change improves the reliability and responsiveness of task management, particularly during cancellation scenarios. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly fixes a deadlock issue during task cancellation by ensuring the cancellation event is marked as final. The change in Executor.Cancel is simple and effective. The addition of the TestExecutor_Cancel_AfterEvent integration test is excellent, as it thoroughly validates the fix by simulating the exact scenario that was causing the deadlock. I have one minor suggestion in the test to improve its robustness by using the timeout-controlled context consistently.
* fix: cancelation error * removed debugging block * remove unnecessary initialized ctx * fix lint
* fix: update time truncation in database session service (google#448) When truncating to microseconds (to match db max allowed precision) we need to keep the location of the event. Right now it's overwritten here. That's the fix for tests mostly, because in real world scenario the event is created and overwritten here by the same library code, thus it will be the same location anyway. * fix: Only last state change for multiple tool calls is being committed to state (google#449) * fix: merge state deltas from multiple tool calls instead of overwriting Previously, when multiple tools were called in parallel, only the last tool's state delta was preserved because mergeEventActions used direct assignment. This change uses maps.Copy to properly merge all state deltas, preserving state changes from all tool calls. Fixes google#354 Signed-off-by: majiayu000 <[email protected]> * fix: deep merge state delta in parallel tool actions --------- Signed-off-by: majiayu000 <[email protected]> * fix: remote agent responses are not stored in streaming mode (google#420) Events marked as Partial are not stored in the sessions store. Updated comments to make it more clear. Updated A2A remote-agent to aggregate text and thoughts from partial events and emit them with or before the terminal events. Long-running functions calls are emitted as well because a response is expected to be delivered as a follow-up function call. Before the fix remoteagent responses in streaming mode were lost unless a Task was emitted as the terminal event (which is not the case for adka2a.Executor) * fix: support CitationMetadata (google#463) Added converters for CitationMetadata in ./server/adka2a/metadata.go and verified in test case. * fix: parse events with errors in agenttool (google#468) Right now agenttool for error flow only handles go errors (e.g. when ADK Go agent is wrapped). But agenttool can also wrap remoteagent which may return events with error_code and/or error_message. Currently agenttool would return {} because may not have a content, which hides the error. * chore: correct LoadArtifactHandler Route Name (google#464) * artifact: don't allow file names with path separators (google#472) File separators as filename breaks the layout of the object name. Reject file names that contain any separators. * Implement graceful shutdown for web server via context cancellation (google#462) * fix: use context to perform graceful shutdown for the web server * Add configurable shutdown-timeout for web server * fix: Increase default server shutdown timeout from 5s to 15s * Return ctx.Err() on graceful shutdown as per PR review * fix: close errChan when srv.ListeandServer() returning nil or error is not http.ErrServerClosed Signed-off-by: Imre Nagi <[email protected]> * fix: handle closed errChan in webLauncher Run --------- Signed-off-by: Imre Nagi <[email protected]> * fix: resolve nil pointer in tool.Context.SearchMemory in webui mode (google#476) * fix: resolve nil pointer in tool.Context.SearchMemory in webui mode * Apply suggestion from @dpasiukevich --------- Co-authored-by: Dmitry Pasiukevich <[email protected]> * Implement OutputResponseProcessor to enable use of OutputSchema and tools (google#441) * Implement OutputResponseProcessor to enable use of OutputSchema and tools on Gemini models that does not support structured outputs with tools [1] Based on adk-python [2], if both specified on non-supported model, - Do not set output schema on the model config, - Add a special tool called set_model_response using the output schema, - Instruct the model to use this tool to output its final result, rather than output text directly. Fixes google#307 Tested via https://gist.github.com/caglar10ur/52e49e9be5f64f5ac47107810c9e60fb [1] https://ai.google.dev/gemini-api/docs/structured-output?example=recipe#structured_outputs_with_tools [2] google/adk-python@af63567 Signed-off-by: Çağlar Onur <[email protected]> * Address gemini-code-assist comments Signed-off-by: Çağlar Onur <[email protected]> * Add missing copyright headers and fix formatting isssues Signed-off-by: Çağlar Onur <[email protected]> * Address remaining formatting isssues Signed-off-by: Çağlar Onur <[email protected]> * Address incorrect return after yield comment Signed-off-by: Çağlar Onur <[email protected]> * Adjust the tests based on the previous change Signed-off-by: Çağlar Onur <[email protected]> --------- Signed-off-by: Çağlar Onur <[email protected]> * feat: include gen_ai.conversation.id in OTEL spans (google#421) (google#428) * fix: return nil when server has been shutdown (google#487) * fix(database): cascade delete events when deleting session (google#483) Previously, attempting to delete a session with related events failed due to a foreign key constraint violation. This commit adds `OnDelete:CASCADE` to the GORM relationship between sessions and events, ensuring that deleting a session will also remove its associated events. This resolves errors encountered during session deletion and allows safe cleanup of sessions with related events. * Make the type comparison case insensitive (google#488) * Make the type comparison case insensitive Otherwise schema validation that uses lowercase values of enums such as "object" or "string" fails. * Add some unittests while at it * Do not modify but create a shallow copy * normalize instead of creating a copy * feat: add FilterToolset helper (google#489) * fix: mark cancellation events as Final in Executor (google#494) * fix: cancelation error * removed debugging block * remove unnecessary initialized ctx * fix lint * Update adk-web to 958c8d6278b45c59e413f52ace147c28c5767a4d (google#491) Generated using following Dockerfile ``` FROM node:iron-trixie AS builder RUN apt-get update && apt-get install -y git RUN git clone https://github.com/google/adk-web RUN mkdir output && cd adk-web && npm install && npm install -g @angular/cli && ng build --output-path=../output ``` via executing ``` docker build -t adk-web:latest . docker create adk-web:latest docker cp affectionate_chaplygin:/output . rsync --delete -avz output/browser/ adk-go/cmd/launcher/web/webui/distr/ ``` * feat: add plugin package (google#480) * Add plugin system * Add plugin system * Removed test with redundant check, tool.Context can never be cast to agent.InvocationContext. * lint fixes * Fix tests plugin name * Added symmetrical after callback execution for plugins * Moved pluginManager to internal context value * Fix runOnToolErrorCallbacks for tool not found * Change plugins fields to private * Small test fix * Avoid maintaining the copyright year manually (google#495) * Avoid maintaining the copyright year manually * Fix format issue * Add vertexAi session service (google#235) * vertex ai session service implementation * fix comments * add events list method, save appended event locally * Create vertexai package - Move vertexai session service to separate package. - Allow multiple app usage by having different reasoning engine. - Add create_engine example. * Add log to Close err * Fix unsafe type assertion, out of bounds numRecentEvents * Add session state update * Add aiplatform to genai convertion for GroundingMetadata and content parts. * Add vertexai session service tests * Run go mod tidy * Add option.WithoutAuthentication() to replay, fix lint issues * Fix lint issues * Fix linter execution * Modify state All iterator to use copy * Fix getReasoningEngineID regex and add respsective tests * chore: go mod tidy * Add IsNotFoundError validation for retry, make vertex session service Get concurrent * Fix vertex ai session service timestamp Filter tests * Change example env variable name * Add session name validations * fix GOOGLE_GOOGLE_CLOUD_LOCATION env name * Add part tought and thoughtSignature to converter --------- Co-authored-by: Dima Stabrouski <[email protected]> * fix: resolve the run config streaming mode param ineffective issue in a2a remote agent (google#485) * fix: Add MCP session reconnection with Ping health check (google#417) * auto-reconnect on connection failure * introduce MCPClient struct * Update tool/mcptoolset/set_test.go * address lint issues * only consider refreshing on certain errors * Add TODO for session not found error handling --------- Co-authored-by: majiayu000 <[email protected]> * docs: update docs for llmagent (google#500) * update docs for llmagent * update docs for Aftertoolback * feat: enable plugin support in launcher config with tests (google#503) * feat: enable plugin support in launcher config with tests * Add nil check for plugin registration * Fix nil plugin check in registerPlugin method * Fix nil check for plugin registration --------- Co-authored-by: Dmitry Pasiukevich <[email protected]> * fix: pass pluginConfig instead of only plugins (google#504) * fix: pass pluginConfig instead of only plugins * fix: long-running operation handling as a2a input-required task (google#499) * Include long-running tool function call and response are into A2A TaskStatus message. This is tracked by `inputRequiredProcessor`. This makes it possible for non-blocking clients to discover which input is required to continue a task. * Before starting execution in `adka2a.Executor` validate that user message carries all the required `FunctionCallResponse`s by comparing them against those recorded in task status. * Added test for checking that A2A request payload is assembled correctly by remoteagent. * Refactored `contents_processor` to use functions from `utils`. * Updated `runner.go` to handle the case when user message carries a function response for a long-running call initiated by a non-root agent. This agent needs to be invoked bypassing root. Added integration tests for two cases: * a2aclient -> a2aserver -> adka2a.Executor -> llmagent with a long running tool * a2aclient -> server A -> adka2a.Executor A ->-> llmagent with remote subagent -> remotesubagent -> server B -> adka2a.Executor B -> llmagent with a long running tool * Fix MCP reconnection on EOF error (google#505) * docs: upd docs for vertexai (google#515) * docs: upd docs for vertexai * Removed plugin manager after callback symmetry (google#514) * feat: add human in the loop confirmation (google#490) * Add ToolConfirmation (WIP) * Modify processor type to generator * Add tool confirmation sample * Fix loss of tool confirmation when handling event with multiple function calls * Add basic tool confirmation test * Add request confirmation tests * Fix merge test failures * lint fixes * Remove duplicated code block * merge fix * fix merge confict in base_flow_test * Add tool confirmation docs * merge fix set_test * lint fix * Add package doc for toolconfirmation example * Add tools_processor to cache tools * Update RequireConfirmationProvider doc * Move session.REQUEST_CONFIRMATION_FUNCTION_CALL_NAME to toolconfirmation.RequestConfirmationFunctionCallName. Add toolconfirmation.OriginalCallFrom helper function. * Add OriginalCallFrom godocs * Remove comments on OriginalCallFrom * Change RequestConfirmationFunctionCallName to toolconfirmation.FunctionCallName * Fix godoc * Add custom part converters to ADK Executor. (google#512) * Add custom part converters to ADK Executor. This change essentially mirrors the functionality available in the Python A2A implementation at https://github.com/google/adk-python/blob/main/src/google/adk/a2a/executor/a2a_agent_executor.py#L62-L67. The use case here is to allow conversion of specific A2A Data parts into something slightly structured (or annotated in some way) for ADK consumption. Currently data parts which are not Function or ExecutableCode related get converted to Text parts (https://github.com/google/adk-go/blob/main/server/adka2a/parts.go#L253). Conceptually these could become InlineData parts (which do tend to overlap with the model of File parts) or extra processing could be applied to make them more consumable by ADK agents. Similarly semi-structured output from ADK agents are returned as TextParts, this just gives us the responsible to marshal these as typed data parts based on whatever signifies we include. * refactor * Shorten converter type and field names * Do not make a separate filter pass * Do not export WithConverter methods * Export singular mappers to make delegation easier --------- Co-authored-by: Yaroslav Shevchuk <[email protected]> * feat: provide access to session events from executor callbacks (google#521) * provide access to session events from executor callbacks * gemini nitpicks * test cleanup * docs: update feature_request.md (google#520) * Update feature_request.md * docs: made "willingness to contribute" section optional --------- Co-authored-by: Dmitry Pasiukevich <[email protected]> * docs: update bug_report.md (google#519) * Update bug_report.md * upd --------- Co-authored-by: Dmitry Pasiukevich <[email protected]> * Implement the load memory tool for searching memory (google#497) * Implement the load memory tool for searching memory * Fix lint issue * fix InvocationId being replaced on invocationContext duplication (google#525) * feat: add human in the loop confirmation for mcp toolset (google#523) * Add tool confirmation for MCP tool set * Add RequireConfirmationProvider_Validation test * fix small doc change * fix requireConfirmation provider priority over bool field * Add toolname to mcptool confirmation provider * lint fix * Change to only call confirmation provider if toolconfirmation does not exist * Change test struct * fix test missing invocation context * feature: add WithContext method to agent.InvocationContext to support context modification (google#526) * feat: add `WithContext` to `InvocationContext` * Apply Code Assist comments * docs: add note about future context changes Add temporary note regarding context handling changes. * Fix comment formatting --------- Co-authored-by: Dmitry Pasiukevich <[email protected]> * feat: add loggin plugin (google#534) * feat: console refactor (google#533) * fix: apply temp state deltas to local session to keep them within invocation (google#537) * feat: add a function call modifier plugin. (google#535) * Add function call modifier plugin. * lint fix * simplify test check logic * flat afterModelCallback func * Add support of the preload memory tool (google#527) * feat: add OpenTelemetry configuration and initialization - TracerProvider (google#524) * Implement telemetry initialization * Update telemetry/setup_otel.go Co-authored-by: Dmitry Pasiukevich <[email protected]> * Review comments * cherry-picked Implement telemetry initialization * GCP config improvements * Replace telemetry.Service intefrace with telemetry.Providers struct * Add helper function for resolving quota and resource projects. --------- Co-authored-by: Dmitry Pasiukevich <[email protected]> * Move the function call modifier plugin into one package (google#546) * Migrate old telemetry (google#529) * fix: set SkipSummarization in RequestConfirmation to stop agent loop (google#544) * fix: set SkipSummarization in RequestConfirmation to stop agent loop When using ctx.RequestConfirmation() manually, the agent loop does not stop because SkipSummarization is not set on the function response event. The built-in RequireConfirmation path (functiontool) sets it automatically, but the manual RequestConfirmation path does not. Fixes google#543 * test: add ToolConfirmation field assertions to auto-generated ID test * fix: remoteagent partial response handling leads to data duplication (google#545) * set explicit partial event marking and maintain a temporary artifact for streaming events * fix metadata merging * fix duplicating long running function call * fix citations not included * explicit mark as turnComplete on event conversion * fix response duplication for partial and non-partial events * aggregation logic tests * gemini mock streaming test * added failing test cases * lint * fix partial not reset on errors * fix test * fix: pass long_running_function_ids in the event generated for request confirmation (google#553) * chore: added tests to verify tool confirmation works with a2a (google#560) * add tool confirmation test * lint * gemini comments * fix: clear aggregated text and thoughts when receiving a non-appending task artifact update (google#555) clear aggregated text and thoughts when receiving a non-appending task artifact update * feat: Implement OpenTelemetry semconv tracing (google#548) * feat: Implement OpenTelemetry semconv tracing for agent invocations and LLM calls. * Fix shadowed context issue * Clean up attributes * Revert changes to CallLLM span * refactor: changes from review comments 1. Rename telemetry span creation functions to `Start...Span`, result recording functions to `Trace...Result` 1. skip merged tool call span when there is single tool call. 1. Dedupe tool params and move args to span start. 1. Remove redundant `codes.Ok` status setting for spans without errors. * fix: End `GenerateContent` telemetry span immediately upon final response or error to accurately measure model latency. * refactor: Inline telemetry attribute keys and remove redundant LLM request/response attributes from tool call traces. * cleanup: remove unused error variable assignment in test range loops * fix: prevent duplicate span ending in LLM flow by tracking span state * refactor: Update `StartInvokeAgentSpan` to accept an `agent` interface and session ID directly, removing the `StartInvokeAgentSpanParams` struct. * cleanup: remove call_llm span after adding semconv tracing (google#556) --------- Signed-off-by: majiayu000 <[email protected]> Signed-off-by: Imre Nagi <[email protected]> Signed-off-by: Çağlar Onur <[email protected]> Co-authored-by: Dmitry Pasiukevich <[email protected]> Co-authored-by: lif <[email protected]> Co-authored-by: Yaroslav <[email protected]> Co-authored-by: Serob Nahapetyan <[email protected]> Co-authored-by: Suraj Bobade <[email protected]> Co-authored-by: Jaana Dogan <[email protected]> Co-authored-by: Imre Nagi <[email protected]> Co-authored-by: sjy3 <[email protected]> Co-authored-by: Çağlar Onur <[email protected]> Co-authored-by: Jeet <[email protected]> Co-authored-by: Esteban Del Boca <[email protected]> Co-authored-by: Çağlar Onur <[email protected]> Co-authored-by: Serob Nahapetyan <[email protected]> Co-authored-by: João Westerberg <[email protected]> Co-authored-by: hulk <[email protected]> Co-authored-by: Dima Stabrouski <[email protected]> Co-authored-by: Marcus Rodan <[email protected]> Co-authored-by: indurireddy-TF <[email protected]> Co-authored-by: MarcoCerino23 <[email protected]> Co-authored-by: Zack Birkenbuel <[email protected]> Co-authored-by: Yasir Modak <[email protected]> Co-authored-by: Paweł Maciejczek <[email protected]> Co-authored-by: Jayden Park <[email protected]> Co-authored-by: Daria Wieliczko <[email protected]>
Previously, Executor.Cancel dispatched a StatusUpdateEvent with TaskStateCanceled, but the Final flag was left as false (default). This caused the event stream to remain open. So the cancellation process would deadlock, and not let the task to actually terminate.
To fix this updated Executor.Cancel to explicitly set event.Final = true. This ensures the event stream is closed immediately upon cancellation.
I added a new test in ./server/adka2a/executor_test.go TestExecutor_Cancel_AfterEvent.
It starts a test server wrapping the executor and starts a client
Calls SendMessage with blocking: false (returns immediately while task runs).
Calls CancelTask on the active task.
Asserts that the task state is updated to TaskStateCanceled.
Verifies that the agent's execution context is successfully closed (unblocking the agent). Without this fix, the context remains open and the test times out.