-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Preamble
Title: Asynchronous Tool Execution
Author: Surbhi Bansal, Luca Chang
Status: Proposal
Type: Standards Track
Created: 2025-08-26
Abstract
This SEP improves support for long-running operations (LRO) in the Model Context Protocol (MCP). It introduces a modified tool call operation with token-based execution tracking, with version negotiation as a means of supporting old and new tool call semantics on the same RPC method. Clients pass a unique operation token to represent a task, and servers manage background execution and serve operation statuses and results to client requests with the operation token provided. By splitting a tool call into multiple request-response methods, this avoids the need for persistent connections between the client and server to support long-running tools. This also enables tool results to be retrieved multiple times, enabling call-now, fetch-later execution and improved resilience to connection faults.
Motivation
The current MCP specification only supports single request-response tool execution, which creates significant limitations for real-world applications. While host applications or tool implementors could theoretically implement custom long-running operation handling themselves, this approach is fundamentally inadequate:
Why Host-Level Solutions Are Insufficient:
- Lack of Standardization: This is a common use case that many MCP servers require and are already duplicating effort to support (see customer use cases below). Server implementers want standardized approaches for supporting this to avoid redundant and error-prone efforts.
- Model Behavior Inconsistency: Model-driven solutions rely heavily on inconsistent behavior and prompt engineering to decide when and if to poll at all, as opposed to having a deterministic, protocol-defined construct for long-running operations. With the proposed solution, host applications will have a reliable mechanism for immediately backgrounding long-running tool calls and waking the model up later to handle their results.
- Limited Solutions for Timeouts: There is no standardized way to handle operations that exceed typical request timeouts across all tools. Servers need to impose timeouts to avoid resource exhaustion, but these cannot always be configured in a way that generalizes across all tools, or even across combinations of parameters in a single tool. Rather than being limited by timeouts in the network stack, this enables tools to execute for arbitrary lengths of time without holding a single connection open, and without relying on clients to terminate connections proactively.
This SEP addresses these limitations by introducing long-running operation capabilities while preserving all existing functionality.
Customer Use Cases Requiring Long-Running Operation Support
The current single request-response execution model prevents MCP from supporting common use cases that require extended processing time, on the scale of minutes to hours:
1. Healthcare & Life Sciences Data Analysis
Challenge: Amazon's customers in the healthcare and life sciences industry are attempting to use MCP to wrap existing computational tools to analyze molecular properties and predict drug interactions, processing hundreds of thousands of data points per job from chemical libraries through multiple inference models simultaneously.
Duration: Small molecule analysis requires 30-60 minutes; large molecule simulations with complex predictions take several hours.
Current Workaround: Not yet determined.
Impact: Cannot integrate with real-time research workflows, prevents interactive drug discovery platforms, and blocks automated research pipelines. These customers are looking for best practices for long-running tool calls and have noted the lack of support in MCP as a concern. If these customers do not have a solution for long-running tool calls, they will likely forego MCP and continue using their existing platforms.
Ideal: Some form of push notification system to avoid blocking their agents on long analyses, with concurrent long-running tool calls as an answer for operations executing in the range of a few minutes.
2. Enterprise Automation Platforms
Challenge: Amazon’s large enterprise customers are looking to develop internal MCP platforms to automate SDLC processes across their organizations, extending to sales, customer service, legal, HR, and cross-divisional teams. They have noted they have long-running agent and agent-tool interactions beyond typical timeouts, supporting complex business process automation.
Duration: Processing time ranges from minutes to hours depending on the task.
Current Workaround: Not yet determined. Considering an application-level system outside of MCP backed by webhooks.
Impact: Limitations related to the host doing synchronous request-response execution prevent complex business process automation and limit sophisticated multi-step operations. These customers want to dispatch processes concurrently and collect their results later, and are noting the lack of async tool calls as a concern — and are considering involved application-level notification systems as a possible workaround.
Ideal: Built-in mechanisms for managing concurrent work to avoid needing to implement notification systems specific to their own tool conventions themselves.
3. Code Migration Workflows
Challenge: Amazon has automated code migration and transformation tools to perform upgrades across its own codebases and those of external customers, and is attempting to wrap those tools in MCP servers. These migrations analyze dependencies, transform code to avoid deprecated runtime features, and validate changes across multiple repositories.
Duration: Processing time ranges from minutes to hours depending on migration scope, complexity, and validation requirements. Large enterprise workloads require extensive testing cycles.
Current Workaround: Developers implement manual tracking by splitting a job into create and get tools, forcing models to manage state and repeatedly poll for completion.
Impact: Poor developer experience due to needing to replicate this hand-rolled long-running operation mechanism across many tools. One team had to debug an issue where the model would hallucinate job names if it hadn’t listed them first. Validating that this does not happen across many tools in a large toolset is time-consuming and error-prone.
Ideal: Support SDK-controlled polling by default to support pushing a tool to the background and avoiding blocking other tasks in the chat session. The team needs the same pattern across many tools in their MCP servers, and wants a common solution across them.
4. Test Execution Platforms
Challenge: Amazon’s internal test infrastructure executes comprehensive test suites including thousands of test cases, integration tests across services, and performance benchmarks. They have built an MCP server wrapping this existing infrastructure.
Duration: Test runs can execute for hours in some services, especially for full regression testing suites.
Current Workaround: For streaming test logs, the MCP server exposes a tool that can read a range of log lines, as it cannot effectively notify the client when the execution is complete. There is not yet any workaround for executing test runs.
Impact: Cannot run a test suite and stream its logs simultaneously without a single hours-long tool call, which would time out on either the client or the server. This prevents agents from looking into test failures in an incomplete test run until the entire test suite has completed, potentially hours later.
Ideal: Support application-driven long-running tool calls, so a client can be notified when a long-running tool is complete.
5. Deep Research
Challenge: Deep research tools spawn multiple research agents to gather and summarize information about topics, going through several rounds of search and conversation turns internally to produce a final result for the caller application.
Duration: 3-5 minutes per research topic.
Current Workaround: The research tool can be split into separate tools to create a report job and get the status/result of that job later.
Impact: When using this in clients like Claude Desktop, which has a 4 minute MCP tool call timeout, tool calls will time out unpredictably if some agents take longer than usual to complete. After splitting the tool into create/get steps, the LLM runs into issues calling the get tool repeatedly — instead, it sometimes calls the get tool once before ending its conversation turn, claiming to be "waiting" before calling the tool again. It cannot resume until receiving a new user message. This also complicates expiration times, as it is not possible to predict when the client will retrieve the result when this occurs. It is possible to work around this by adding a wait tool for the model, but this prevents the model from doing anything else concurrently.
Ideal: Support backgrounding a tool call with a deterministic way to notify the model when a result is ready, so it can be immediately retrieved and the tool result can be deleted. This cannot be done generically today in any client application.
6. Agent-to-Agent Communication (Multi-Agent Systems)
Challenge: One of Amazon’s internal multi-agent systems for customer question answering faces scenarios where agents require significant processing time for complex reasoning, research, or analysis. When agents communicate through MCP, slow agents cause cascading delays throughout this system.
Duration: Several minutes for complex tasks.
Current Workaround: Not yet determined.
Impact: Synchronous communication creates cascading delays, prevents parallel agent processing, and degrades system responsiveness for other time-sensitive interactions.
Ideal: Notifications of some kind to allow agents to perform other work concurrently and get notified once long-running tasks complete.
These use cases demonstrate that long-running tool execution is not a theoretical need but a practical requirement for real MCP deployments in production environments.
Integration with Existing Architectures
Many production systems already provide async workflow execution capabilities with built-in state tracking, monitoring, and management. This proposal enables MCP servers to expose these existing APIs directly rather than forcing them into synchronous wrappers.
Benefits for Existing Async Systems:
- Leverage Existing State Management: Systems like Step Functions, Kubernetes, and CI/CD platforms already maintain execution state, logs, and results. MCP servers can use these systems' native identifiers (ARNs, job names, run IDs) as MCP tokens.
- Preserve Native Monitoring: Existing monitoring, alerting, and observability tools continue to work unchanged. The async execution happens within the existing system's context.
- Reduce Implementation Overhead: Server implementers don't need to build new state management, persistence, or monitoring infrastructure. They can focus on the MCP protocol mapping.
This approach simplifies integration with existing workflows and allows workflow services to continue to manage their own state, rather than making MCP servers that do nothing but poll on other services.
Specification
Core Approach: Extend Existing Tool Call with Token-Based Tracking
This proposal introduces an extension of the existing tools/call RPC call method to support long-running operations.
Async tools MUST be marked with an invocationMode field to differentiate them. SDKs MUST NOT show LRO-only tools or the invocationMode field to clients on old protocol versions.
SDKs MAY support declaring tools that support both non-LRO and LRO execution. In these cases, clients on old protocol versions will be able to see the tool definition without the invocationMode field, and will continue to call it with non-LRO semantics. Clients on new protocol versions will see the tool definition with the invocationMode field, and therefore MUST call the tool with LRO semantics.
When issuing a CallToolRequest for a long-running tool call on a supported protocol version, the client MUST provide a unique operation.token to represent the operation across requests. The server MUST validate that this token is unique before accepting new operations. This token SHOULD be scoped to the particular authorization context or session it originated from, to avoid granting unauthorized access to tool results to third parties.
The CallToolResult MAY include immediate results in its content fields to immediately provide to the client (e.g. to guide subsequent model behavior).
The result of a long-running operation does not need to be retained by the server indefinitely; the client MAY specify an operation.keepAlive value representing the time it needs the final tool result to be available for after the tool call reaches the "completed" or "failed" status. Servers MAY override this duration with a suitable duration independently, and MUST return its final value in the operation.keepAlive field in the CallToolResult. An operation.keepAlive of null represents an unlimited duration.
When an operation’s operation.keepAlive duration has elapsed, the server SHOULD move the operation into the "failed" status. Servers are not obligated to retain operation metadata indefinitely, and MAY delete the operation’s metadata at any time after the operation’s operation.keepAlive duration has elapsed, regardless of the observed state of the operation.
Servers using session-based transports SHOULD NOT delete active sessions until all operation.keepAlive durations have elapsed, unless the client requests the termination of its session. Upon session termination, servers MAY choose to interrupt ongoing async tool calls and delete any associated status, if operation.token was session-bound.
Servers MAY include an operation.pollFrequency in the immediate CallToolResult, representing (in seconds) how often clients are expected to check for the operation status. Clients SHOULD respect this value when polling for results.
Receivers of notifications/cancelled notifications for the request ID associated with the original CallToolRequest SHOULD ignore any subsequent requests using the associated token and immediately move the operation to the "canceled" status. Servers are not obligated to retain operation metadata indefinitely, and MAY delete the operation’s metadata at any time after the operation has been moved to the "canceled" status.
If a server sends the client a request or notification associated with a long-running operation, the request/notification MUST contain an _operation.token associated with the originating operation and move it into the "input_required" status. In the case of requests, the client MUST respond with the same _operation.token for association. Upon completing the associated request, the server MAY move the operation out of the "input_required" status, unless other associated requests are still pending.
An ongoing LRO can be queried with a new tools/async/status RPC call, which accepts a token representing the original CallToolRequest. When a GetOperationStatusRequest is issued, the server MUST respond with a GetOperationStatusResult containing one of the following statuses:
"submitted": The operation has been submitted, but has not yet begun execution."working": The operation is incomplete."input_required": The server is expecting a response from the client for e.g. elicitation or sampling."completed": The operation completed successfully."canceled": The operation was canceled. This is a terminal status."failed": The operation completed unsuccessfully. Theerrorfield SHOULD be set to the failure reason. This is a terminal status."unknown": Fallback state to denote unexpected conditions. This SHOULD be considered a terminal status.
When an asynchronous tool call is in the "completed" status, its result can be queried with a new tools/async/result RPC call, accepting a token representing the CallToolResult. When a GetOperationPayloadRequest is issued and the corresponding token matches a valid "completed" operation, the server MUST respond with a GetOperationPayloadResult, which will contain the final CallToolResult.
Modified Base Request
interface Request {
method: string;
params?: {
_meta?: {
progressToken?: ProgressToken;
[key: string]: unknown;
};
/**
* Async operation parameters, only used when a request is sent during an asynchronous tool call.
*/
_operation?: {
/**
* The token associated with the originating asynchronous tool call.
*/
token: string;
};
[key: string]: unknown;
};
}Modified Base Notification
interface Notification {
method: string;
params?: {
_meta?: { [key: string]: unknown };
/**
* Async operation parameters, only used when a notification is sent during an asynchronous tool call.
*/
_operation?: {
/**
* The token associated with the originating long-running operation tool call.
*/
token: string;
};
[key: string]: unknown;
};
}Modified Base Result
interface Result {
_meta?: { [key: string]: unknown };
/**
* Async operation parameters, only used when a result is sent in response to a request with operation parameters.
*/
_operation?: {
/**
* The token associated with the originating long-running operation tool call.
*/
token: string;
};
[key: string]: unknown;
}Modified Tool Call Request
interface CallToolRequest {
method: "tools/call";
params: {
name: string;
arguments?: { [key: string]: unknown };
// May be present in async tool call requests. Does not need to be provided.
operation?: {
/**
* Client-generated token to use for tracking the operation.
*/
token: string;
/**
* Number of seconds the client wants the result to be kept available upon completion.
*/
keepAlive?: number;
};
};
}Response Types
Modified Tool Call Response
type OperationStatus = "submitted" | "working" | "completed" | "canceled" | "failed" | "input_required" | "unknown";
interface CallToolResult {
content: ToolContent[];
isError?: boolean;
// Will always be present in async tool call responses.
operation?: {
/**
* Number of seconds the result will be kept available upon completion.
*/
keepAlive: number | null;
/**
* Number of seconds the server suggests the client wait between status checks.
*/
pollFrequency?: number;
/**
* Initial status of the async operation.
*/
status: OperationStatus;
};
}Async Operation Management
Status Checking
interface GetOperationStatusRequest {
method: "tools/async/status";
params: {
token: string;
};
}
interface GetOperationStatusResult {
/**
* Current status of the async operation.
*/
status: OperationStatus;
/**
* Error message if status is "failed".
*/
error?: string;
}Result Retrieval
interface GetOperationPayloadRequest {
method: "tools/async/result";
params: {
token: string;
};
}
interface GetOperationPayloadResult {
/**
* The result of the tool call.
*/
result: CallToolResult["params"];
}Enhanced Tool Definitions
interface Tool {
name: string;
description: string;
inputSchema: JSONSchema;
// NEW: Optional async capability declaration
invocationMode?: "sync" | "async";
}Tool Examples
const tools = [
{
name: "quick_calculation",
description: "Perform fast mathematical calculations",
inputSchema: { /* ... */ }
// No invocationMode field = sync only
},
{
name: "search_web",
description: "Search the web",
inputSchema: { /* ... */ },
// Explicitly declaring sync-only execution
invocationMode: "sync"
},
{
name: "analyze_dataset",
description: "Analyze large datasets",
inputSchema: { /* ... */ },
// Supports async execution, may or may not support sync execution on old clients.
// Clients that can see this property should assume async-only execution
// (see backwards-compatibility notes).
invocationMode: "async"
},
];Execution Flow
- Tool Discovery: Client calls
tools/list(existing behavior, no filtering needed) - Sync Path (Old Clients): Client calls
tools/call, server executes synchronously and returns result immediately - Async Path (New Clients): Client calls
tools/call, server starts background execution and returns token - Client Polling: Client polls
tools/async/statusuntil status is"completed","canceled","failed", or"unknown" - Result Retrieval: Client calls
tools/async/resultto get final result - Cleanup: Server cleans up operation data after result retrieval or expiration
Example Usage
Synchronous Execution (Existing Behavior)
const result = await client.request({
method: "tools/call",
params: {
name: "quick_calculation",
arguments: { x: 5, y: 10 },
},
});
// Returns: CallToolResult immediately
// (Not shown) Send the result to the conversation...Asynchronous Execution (New)
// Call tool asynchronously using dedicated async RPC call
const asyncResponse = await client.request({
method: "tools/call", // Reuses existing RPC call
params: {
request: {
name: "expensive_analysis",
arguments: { dataset: "large_file.csv" },
},
// New property
operation: {
keepAlive: 3600,
},
},
});
// Returns: CallToolResult { content: [ TextContent("started analysis...") ], operation: { token: "abc123", keepAlive: 3600, pollFrequency: 5 } }
// (Not shown) Send the initial result to the conversation...
// Poll for completion deterministically
while (true) {
const status = await client.request({
method: "tools/async/status",
params: { token: asyncResponse.operation.token },
});
if (status.status === "completed") {
const result = await client.request({
method: "tools/async/result",
params: { token: asyncResponse.operation.token },
});
// (Not shown) Send the result to the conversation...
break;
}
await sleep(1000 * asyncResponse.operation.pollFrequency ?? 1);
}Error Handling
Servers MUST return standard JSON-RPC errors for the following protocol error cases:
- Long-running operation called by an outdated client:
-32600(Invalid request) - Long-running operation called with an invalid or nonexistent
token:-32602(Invalid params) tools/callcalled with atokenthat was already used to start a different operation:-32602(Invalid params)- Internal errors:
-32603(Internal error)
Servers SHOULD provide an informative error message to describe the cause of any errors. For example, a server that fails to look up an operation by its ID MAY return the following error:
{
"jsonrpc": "2.0",
"id": 70,
"error": {
"code": -32602,
"message": "Failed to retrieve operation: Operation not found"
}
}Likewise, a server that attempts to retrieve an expired operation MAY return the following error:
{
"jsonrpc": "2.0",
"id": 70,
"error": {
"code": -32602,
"message": "Failed to retrieve operation: Operation has expired"
}
}Note that servers are not obligated to retain operation metadata indefinitely, and it is compliant behavior for a server to return a “not-found” error should it purge an expired operation.
Rationale
Design Decision: Extend Existing RPC Call
The decision to extend the existing RPC call rather than introducing a new, dedicated tool call type was made for the following key reasons:
API Clarity:
- Maintaining a single
tools/callRPC call maintains a single abstraction for all tool calls, regardless of invocation mode - Avoids divergence between non-LRO and LRO tool call data models
Host Application Simplicity:
- Automatically filtering out LRO tools for old client versions enables older clients to continue using servers without errors
- Maintaining the same data models enables retaining existing tool call handling paths rather than building out parallel ones
- Predictable workflow of tool call and polling can be wrapped in a convenience method by clients for simpler host application integration
Alternative Designs Considered
Webhooks:
MCP could provide a way for clients to pass a webhook to the server, which would receive notifications and the tool result asynchronously. This would be especially useful for tasks with a distant time horizon, on the order of hours or longer.
Rejected because:
- For remote servers or containerized local servers, this potentially requires a public endpoint for the MCP server to invoke. This is nontrivial to implement securely, especially in a desktop host application.
- This approach may be pursued in a separate proposal as a specialized approach suitable for very-long-running operations.
Application-Level Job IDs:
Server developers can implement similar functionality on top of synchronous tool calls today:
// immediately return after tool call
CallToolResult { jobId: "00000000-0000-0000-0000-000000000000" }
// model polls in chat session
CallTool { name: "tool", arguments: { jobId: "00000000-0000-0000-0000-000000000000" } }Rejected because:
- This is a common use case that many MCP servers require and are already duplicating effort to support (see customer use cases above). Server implementers want standardized approaches for supporting this to avoid redundant and error-prone efforts.
- This relies heavily on inconsistent model behavior and prompt engineering to decide when and if to poll at all, as opposed to having a deterministic, protocol-defined construct for asynchronous calls. With the proposed solution, host applications will have a reliable mechanism for immediately backgrounding long-running tool calls and waking the model up later to handle their results.
- Client-configured timeouts and timeouts in network gateways can interrupt legitimate long-running operations before completion.
Backward Compatibility
This SEP introduces no backward incompatibilities. All existing MCP functionality remains completely unchanged:
Compatibility Guarantees
- Existing methods unchanged:
tools/call,tools/list, and all other existing methods continue to work exactly as before - Tool definitions backward compatible: The optional
asyncfield is additive - existing tool definitions work unchanged (controlled by protocol version negotiation) - Client compatibility: Existing clients continue to work without any modifications
- Server compatibility: Servers can implement async support incrementally without affecting existing functionality
Key compatibility aspects:
- Version detection: Servers detect client async support via protocol version or feature flags
- Automatically filtered tool lists: Old clients can only see tools that support sync execution — SDKs may expose methods to allow an async tool to be invoked synchronously if server owners wish to support this
- Graceful degradation: Old clients continue with existing sync functionality in sync-only and hybrid tools
- No breaking changes: Existing tools and workflows remain unaffected, and the
asyncfield is not included for clients on old versions - Gradual adoption: New clients see enhanced tool capabilities when ready
Client Experience by Version
Old Clients (pre-async support):
// tools/list response (filtered)
{
tools: [
// only supports sync execution
{ name: "search_web", description: "Search the web" },
// only supports sync execution
{ name: "quick_calc", description: "Fast calculation" },
// supports both sync and async - invocationMode field is hidden from old clients
{ name: "get_weather", description: "Get weather" }
// async-capable tools hidden from old clients
]
}
New Clients (async support):
// tools/list response (complete)
{
tools: [
// explicitly only supports sync execution
{ name: "search_web", description: "Search the web", invocationMode: "sync" },
// implicitly only supports sync execution
{ name: "quick_calc", description: "Fast calculation" },
// new clients see the async invocation mode and should assume async-only execution
{ name: "get_weather", description: "Get weather", invocationMode: "async" },
// tool is async-only, and is not shown to old clients
{ name: "deep_analysis", description: "Complex analysis", invocationMode: "async" }
// All tools visible, async capabilities declared
]
}Future Work
This async tool execution foundation is designed to evolve naturally into more sophisticated workflow capabilities. The following extensions are planned for future SEPs:
Task-Based Workflows
The token-based async execution can be extended to support multi-operation workflows through task coordination:
Server-Initiated Multi-Agent Flows
The task infrastructure enables server-initiated workflows for multi-agent scenarios:
// Future: Server creates task for multi-agent workflow
async function handleExternalEvent(event, connectedClient) {
const taskId = await server.createTask("Process Support Ticket");
// Server executes operations within task
const analyzeToken = await server.callToolAsync("analyze_ticket", {
ticket: event.ticket,
_meta: { taskId },
});
const classifyToken = await server.callToolAsync("classify_urgency", {
ticket: event.ticket,
_meta: { taskId },
});
// Notify client about server-initiated workflow, providing tokens
// for individual steps
connectedClient.notify("tasks/created", {
taskId,
initiator: "server",
operations: [analyzeToken, classifyToken],
});
}Task Management Methods
Future task-based workflows would introduce additional methods for client monitoring:
// Task status monitoring (client-facing)
interface GetTaskStatusRequest {
method: "tasks/status";
params: { taskId: string };
}
interface GetTaskStatusResult {
taskId: string;
status: "active" | "completed" | "failed";
operations: {
token: string;
name: string;
status: "pending" | "running" | "completed" | "failed";
dependencies: string[];
}[];
progress: { completed: number; total: number };
}
// Task notifications (server-initiated)
interface TaskCreatedNotification {
method: "notifications/tasks/created";
params: {
taskId: string;
initiator: "server";
description?: string;
operations: string[]; // Array of operation tokens
};
}
interface TaskCompletedNotification {
method: "notifications/tasks/completed";
params: {
taskId: string;
status: "completed" | "failed";
finalResults?: any;
};
}Benefits of This Foundation
By establishing async tool execution first, future task-based workflows will benefit from:
- Proven async execution patterns - token management, polling, result retrieval
- Resource management - cleanup, expiration, and resource limits
- Client library support - existing async infrastructure can be extended
This foundation ensures that complex workflow capabilities can be added without disrupting the simple async execution patterns that most applications need.
Triggers and Event-Driven Workflows
Beyond async execution and task-based workflows, the MCP ecosystem will benefit from Triggers - a webhook-like mechanism for event-driven tool execution. While async execution addresses concurrency within persistent connections, Triggers will enable:
- Event-driven activation: Tools that respond to external events without requiring persistent client connections
- Scheduled execution: Time-based tool activation for monitoring, maintenance, and periodic tasks
- Cross-session workflows: Operations that span multiple client sessions or survive client disconnections
The async foundation established in this SEP provides the execution primitives that Triggers will leverage for background operation management.