SEP-1391: Long-Running Operations

## Preamble

**Title:** Asynchronous Tool Execution  
**Author:** Surbhi Bansal, Luca Chang  
**Status:** Proposal  
**Type:** Standards Track  
**Created:** 2025-08-26

## Abstract

This SEP improves support for long-running operations (LRO) in the Model Context Protocol (MCP). It introduces a modified tool call operation with token-based execution tracking, with version negotiation as a means of supporting old and new tool call semantics on the same RPC method. Clients pass a unique operation token to represent a task, and servers manage background execution and serve operation statuses and results to client requests with the operation token provided. By splitting a tool call into multiple request-response methods, this avoids the need for persistent connections between the client and server to support long-running tools. This also enables tool results to be retrieved multiple times, enabling call-now, fetch-later execution and improved resilience to connection faults.

## Motivation

The current MCP specification only supports single request-response tool execution, which creates significant limitations for real-world applications. While host applications or tool implementors could theoretically implement custom long-running operation handling themselves, this approach is fundamentally inadequate:

### Why Host-Level Solutions Are Insufficient:

1. **Lack of Standardization:** This is a common use case that many MCP servers require and are already duplicating effort to support (see customer use cases below). Server implementers want standardized approaches for supporting this to avoid redundant and error-prone efforts.
2. **Model Behavior Inconsistency:** Model-driven solutions rely heavily on inconsistent behavior and prompt engineering to decide when and if to poll at all, as opposed to having a deterministic, protocol-defined construct for long-running operations. With the proposed solution, host applications will have a reliable mechanism for immediately backgrounding long-running tool calls and waking the model up later to handle their results.
3. **Limited Solutions for Timeouts:** There is no standardized way to handle operations that exceed typical request timeouts across all tools. Servers need to impose timeouts to avoid resource exhaustion, but these cannot always be configured in a way that generalizes across all tools, or even across combinations of parameters in a single tool. Rather than being limited by timeouts in the network stack, this enables tools to execute for arbitrary lengths of time without holding a single connection open, and without relying on clients to terminate connections proactively.

This SEP addresses these limitations by introducing long-running operation capabilities while preserving all existing functionality.


### Customer Use Cases Requiring Long-Running Operation Support

The current single request-response execution model prevents MCP from supporting common use cases that require extended processing time, on the scale of minutes to hours:

#### 1. Healthcare & Life Sciences Data Analysis

**Challenge:** Amazon's customers in the healthcare and life sciences industry are attempting to use MCP to wrap existing computational tools to analyze molecular properties and predict drug interactions, processing hundreds of thousands of data points per job from chemical libraries through multiple inference models simultaneously.

**Duration:** Small molecule analysis requires 30-60 minutes; large molecule simulations with complex predictions take several hours.

**Current Workaround:** Not yet determined.

**Impact:** Cannot integrate with real-time research workflows, prevents interactive drug discovery platforms, and blocks automated research pipelines. These customers are looking for best practices for long-running tool calls and have noted the lack of support in MCP as a concern. If these customers do not have a solution for long-running tool calls, they will likely forego MCP and continue using their existing platforms.

**Ideal:** Some form of push notification system to avoid blocking their agents on long analyses, with concurrent long-running tool calls as an answer for operations executing in the range of a few minutes.

#### 2. Enterprise Automation Platforms

**Challenge:** Amazon’s large enterprise customers are looking to develop internal MCP platforms to automate SDLC processes across their organizations, extending to sales, customer service, legal, HR, and cross-divisional teams. They have noted they have long-running agent and agent-tool interactions beyond typical timeouts, supporting complex business process automation.

**Duration:** Processing time ranges from minutes to hours depending on the task.

**Current Workaround:** Not yet determined. Considering an application-level system outside of MCP backed by webhooks.

**Impact:** Limitations related to the host doing synchronous request-response execution prevent complex business process automation and limit sophisticated multi-step operations. These customers want to dispatch processes concurrently and collect their results later, and are noting the lack of async tool calls as a concern — and are considering involved application-level notification systems as a possible workaround.

**Ideal:** Built-in mechanisms for managing concurrent work to avoid needing to implement notification systems specific to their own tool conventions themselves.

#### 3. Code Migration Workflows

**Challenge:** Amazon has automated code migration and transformation tools to perform upgrades across its own codebases and those of external customers, and is attempting to wrap those tools in MCP servers. These migrations analyze dependencies, transform code to avoid deprecated runtime features, and validate changes across multiple repositories.

**Duration:** Processing time ranges from minutes to hours depending on migration scope, complexity, and validation requirements. Large enterprise workloads require extensive testing cycles.

**Current Workaround:** Developers implement manual tracking by splitting a job into `create` and `get` tools, forcing models to manage state and repeatedly poll for completion.

**Impact:** Poor developer experience due to needing to replicate this hand-rolled long-running operation mechanism across many tools. One team had to debug an issue where the model would hallucinate job names if it hadn’t listed them first. Validating that this does not happen across many tools in a large toolset is time-consuming and error-prone.

**Ideal:** Support SDK-controlled polling by default to support pushing a tool to the background and avoiding blocking other tasks in the chat session. The team needs the same pattern across many tools in their MCP servers, and wants a common solution across them.

#### 4. Test Execution Platforms

**Challenge:** Amazon’s internal test infrastructure executes comprehensive test suites including thousands of test cases, integration tests across services, and performance benchmarks. They have built an MCP server wrapping this existing infrastructure.

**Duration:** Test runs can execute for hours in some services, especially for full regression testing suites.

**Current Workaround:** For streaming test logs, the MCP server exposes a tool that can read a range of log lines, as it cannot effectively notify the client when the execution is complete. There is not yet any workaround for executing test runs.

**Impact:** Cannot run a test suite and stream its logs simultaneously without a single hours-long tool call, which would time out on either the client or the server. This prevents agents from looking into test failures in an incomplete test run until the entire test suite has completed, potentially hours later.

**Ideal:** Support application-driven long-running tool calls, so a client can be notified when a long-running tool is complete.

#### 5. Deep Research

**Challenge:** Deep research tools spawn multiple research agents to gather and summarize information about topics, going through several rounds of search and conversation turns internally to produce a final result for the caller application.

**Duration:** 3-5 minutes per research topic.

**Current Workaround:** The research tool can be split into separate tools to create a report job and get the status/result of that job later.

**Impact:** When using this in clients like Claude Desktop, which has a 4 minute MCP tool call timeout, tool calls will time out unpredictably if some agents take longer than usual to complete. After splitting the tool into `create`/`get` steps, the LLM runs into issues calling the get tool repeatedly — instead, it sometimes calls the `get` tool once before ending its conversation turn, claiming to be "waiting" before calling the tool again. It cannot resume until receiving a new user message. This also complicates expiration times, as it is not possible to predict when the client will retrieve the result when this occurs. It is possible to work around this by adding a `wait` tool for the model, but this prevents the model from doing anything else concurrently.

**Ideal:** Support backgrounding a tool call with a deterministic way to notify the model when a result is ready, so it can be immediately retrieved and the tool result can be deleted. This cannot be done generically today in any client application.

#### 6. Agent-to-Agent Communication (Multi-Agent Systems)

**Challenge:** One of Amazon’s internal multi-agent systems for customer question answering faces scenarios where agents require significant processing time for complex reasoning, research, or analysis. When agents communicate through MCP, slow agents cause cascading delays throughout this system.

**Duration:** Several minutes for complex tasks.

**Current Workaround:** Not yet determined.

**Impact:** Synchronous communication creates cascading delays, prevents parallel agent processing, and degrades system responsiveness for other time-sensitive interactions.

**Ideal:** Notifications of some kind to allow agents to perform other work concurrently and get notified once long-running tasks complete.

These use cases demonstrate that **long-running tool execution is not a theoretical need but a practical requirement** for real MCP deployments in production environments.

### Integration with Existing Architectures

Many production systems already provide async workflow execution capabilities with built-in state tracking, monitoring, and management. This proposal enables MCP servers to expose these existing APIs directly rather than forcing them into synchronous wrappers.

### Benefits for Existing Async Systems:

1. **Leverage Existing State Management**: Systems like Step Functions, Kubernetes, and CI/CD platforms already maintain execution state, logs, and results. MCP servers can use these systems' native identifiers (ARNs, job names, run IDs) as MCP tokens.
2. **Preserve Native Monitoring**: Existing monitoring, alerting, and observability tools continue to work unchanged. The async execution happens within the existing system's context.
3. **Reduce Implementation Overhead**: Server implementers don't need to build new state management, persistence, or monitoring infrastructure. They can focus on the MCP protocol mapping.

This approach simplifies integration with existing workflows and allows workflow services to continue to manage their own state, rather than making MCP servers that do nothing but poll on other services.

## Specification

### Core Approach: Extend Existing Tool Call with Token-Based Tracking

This proposal introduces an extension of the existing `tools/call` RPC call method to support long-running operations.

Async tools **MUST** be marked with an `invocationMode` field to differentiate them. SDKs **MUST NOT** show LRO-only tools or the `invocationMode` field to clients on old protocol versions.

SDKs **MAY** support declaring tools that support both non-LRO and LRO execution. In these cases, clients on old protocol versions will be able to see the tool definition without the `invocationMode` field, and will continue to call it with non-LRO semantics. Clients on new protocol versions will see the tool definition with the `invocationMode` field, and therefore **MUST** call the tool with LRO semantics.

When issuing a `CallToolRequest` for a long-running tool call on a supported protocol version, the client **MUST** provide a unique `operation.token` to represent the operation across requests. The server **MUST** validate that this token is unique before accepting new operations. This token **SHOULD** be scoped to the particular authorization context or session it originated from, to avoid granting unauthorized access to tool results to third parties.

The `CallToolResult` **MAY** include immediate results in its content fields to immediately provide to the client (e.g. to guide subsequent model behavior).

The result of a long-running operation does not need to be retained by the server indefinitely; the client **MAY** specify an `operation.keepAlive` value representing the time it needs the final tool result to be available for after the tool call reaches the `"completed"` or `"failed"` status. Servers **MAY** override this duration with a suitable duration independently, and **MUST** return its final value in the `operation.keepAlive` field in the `CallToolResult`. An `operation.keepAlive` of `null` represents an unlimited duration.

When an operation’s `operation.keepAlive` duration has elapsed, the server **SHOULD** move the operation into the `"failed"` status. Servers are not obligated to retain operation metadata indefinitely, and **MAY** delete the operation’s metadata at any time after the operation’s `operation.keepAlive` duration has elapsed, regardless of the observed state of the operation.

Servers using session-based transports **SHOULD NOT** delete active sessions until all `operation.keepAlive` durations have elapsed, unless the client requests the termination of its session. Upon session termination, servers **MAY** choose to interrupt ongoing async tool calls and delete any associated status, if `operation.token` was session-bound.

Servers **MAY** include an `operation.pollFrequency` in the immediate `CallToolResult`, representing (in seconds) how often clients are expected to check for the operation status. Clients **SHOULD** respect this value when polling for results.

Receivers of `notifications/cancelled` notifications for the request ID associated with the original `CallToolRequest` **SHOULD** ignore any subsequent requests using the associated `token` and immediately move the operation to the `"canceled"` status. Servers are not obligated to retain operation metadata indefinitely, and **MAY** delete the operation’s metadata at any time after the operation has been moved to the `"canceled"` status.

If a server sends the client a request or notification associated with a long-running operation, the request/notification **MUST** contain an `_operation.token` associated with the originating operation and move it into the `"input_required"` status. In the case of requests, the client **MUST** respond with the same `_operation.token` for association. Upon completing the associated request, the server **MAY** move the operation out of the `"input_required"` status, unless other associated requests are still pending.

An ongoing LRO can be queried with a new `tools/async/status` RPC call, which accepts a `token` representing the original `CallToolRequest`. When a `GetOperationStatusRequest` is issued, the server **MUST** respond with a `GetOperationStatusResult` containing one of the following statuses:

* `"submitted"`: The operation has been submitted, but has not yet begun execution.
* `"working"`: The operation is incomplete.
* `"input_required"`: The server is expecting a response from the client for e.g. elicitation or sampling.
* `"completed"`: The operation completed successfully.
* `"canceled"`: The operation was canceled. This is a terminal status.
* `"failed"`: The operation completed unsuccessfully. The `error` field **SHOULD** be set to the failure reason. This is a terminal status.
* `"unknown"`: Fallback state to denote unexpected conditions. This **SHOULD** be considered a terminal status.

When an asynchronous tool call is in the `"completed"` status, its result can be queried with a new `tools/async/result` RPC call, accepting a `token` representing the `CallToolResult`. When a `GetOperationPayloadRequest` is issued and the corresponding token matches a valid `"completed"` operation, the server **MUST** respond with a `GetOperationPayloadResult`, which will contain the final `CallToolResult`.

### Modified Base Request

```typescript
interface Request {
  method: string;
  params?: {
    _meta?: {
      progressToken?: ProgressToken;
      [key: string]: unknown;
    };
    /**
     * Async operation parameters, only used when a request is sent during an asynchronous tool call.
     */
    _operation?: {
      /**
       * The token associated with the originating asynchronous tool call.
       */
      token: string;
    };
    [key: string]: unknown;
  };
}
```

### Modified Base Notification

```typescript
interface Notification {
  method: string;
  params?: {
    _meta?: { [key: string]: unknown };
    /**
     * Async operation parameters, only used when a notification is sent during an asynchronous tool call.
     */
    _operation?: {
      /**
       * The token associated with the originating long-running operation tool call.
       */
      token: string;
    };
    [key: string]: unknown;
  };
}
```

### Modified Base Result

```typescript
interface Result {
  _meta?: { [key: string]: unknown };
  /**
   * Async operation parameters, only used when a result is sent in response to a request with operation parameters.
   */
  _operation?: {
    /**
     * The token associated with the originating long-running operation tool call.
     */
    token: string;
  };
  [key: string]: unknown;
}
```

### Modified Tool Call Request

```typescript
interface CallToolRequest {
  method: "tools/call";
  params: {
    name: string;
    arguments?: { [key: string]: unknown };
    // May be present in async tool call requests. Does not need to be provided.
    operation?: {
      /**
       * Client-generated token to use for tracking the operation.
       */
      token: string;
      /**
       * Number of seconds the client wants the result to be kept available upon completion.
       */
      keepAlive?: number;
    };
  };
}
```

### Response Types

#### Modified Tool Call Response

```typescript
type OperationStatus = "submitted" | "working" | "completed" | "canceled" | "failed" | "input_required" | "unknown";

interface CallToolResult {
  content: ToolContent[];
  isError?: boolean;
  
  // Will always be present in async tool call responses.
  operation?: {
    /**
     * Number of seconds the result will be kept available upon completion.
     */
    keepAlive: number | null;
    /**
     * Number of seconds the server suggests the client wait between status checks.
     */
    pollFrequency?: number;
    /**
     * Initial status of the async operation.
     */
    status: OperationStatus;
  };
}
```

### Async Operation Management

#### Status Checking

```typescript
interface GetOperationStatusRequest {
  method: "tools/async/status";
  params: {
    token: string;
  };
}

interface GetOperationStatusResult {
  /**
   * Current status of the async operation.
   */
  status: OperationStatus;
  /**
   * Error message if status is "failed".
   */
  error?: string;
}
```

#### Result Retrieval

```typescript
interface GetOperationPayloadRequest {
  method: "tools/async/result";
  params: {
    token: string;
  };
}

interface GetOperationPayloadResult {
  /**
   * The result of the tool call.
   */
  result: CallToolResult["params"];
}
```

### Enhanced Tool Definitions

```typescript
interface Tool {
  name: string;
  description: string;
  inputSchema: JSONSchema;
  
  // NEW: Optional async capability declaration
  invocationMode?: "sync" | "async";
}
```

### Tool Examples

```typescript
const tools = [
  {
    name: "quick_calculation",
    description: "Perform fast mathematical calculations",
    inputSchema: { /* ... */ }
    // No invocationMode field = sync only
  },
  {
    name: "search_web",
    description: "Search the web",
    inputSchema: { /* ... */ },
    // Explicitly declaring sync-only execution
    invocationMode: "sync"
  },
  {
    name: "analyze_dataset", 
    description: "Analyze large datasets",
    inputSchema: { /* ... */ },
    // Supports async execution, may or may not support sync execution on old clients.
    // Clients that can see this property should assume async-only execution
    // (see backwards-compatibility notes).
    invocationMode: "async"
  },
];

```

### Execution Flow

1. **Tool Discovery**: Client calls `tools/list` (existing behavior, no filtering needed)
2. **Sync Path (Old Clients)**: Client calls `tools/call`, server executes synchronously and returns result immediately
3. **Async Path (New Clients)**: Client calls `tools/call`, server starts background execution and returns token
4. **Client Polling**: Client polls `tools/async/status` until status is `"completed"`, `"canceled"`, `"failed"`, or `"unknown"`
5. **Result Retrieval**: Client calls `tools/async/result` to get final result
6. **Cleanup**: Server cleans up operation data after result retrieval or expiration

### Example Usage

#### Synchronous Execution (Existing Behavior)

```typescript
const result = await client.request({
  method: "tools/call",
  params: {
    name: "quick_calculation",
    arguments: { x: 5, y: 10 },
  },
});
// Returns: CallToolResult immediately

// (Not shown) Send the result to the conversation...
```

#### Asynchronous Execution (New)

```typescript
// Call tool asynchronously using dedicated async RPC call
const asyncResponse = await client.request({
  method: "tools/call", // Reuses existing RPC call
  params: {
    request: {
      name: "expensive_analysis",
      arguments: { dataset: "large_file.csv" },
    },
    // New property
    operation: {
      keepAlive: 3600,
    },
  },
});
// Returns: CallToolResult { content: [ TextContent("started analysis...") ], operation: { token: "abc123", keepAlive: 3600, pollFrequency: 5 } }

// (Not shown) Send the initial result to the conversation...

// Poll for completion deterministically
while (true) {
  const status = await client.request({
    method: "tools/async/status",
    params: { token: asyncResponse.operation.token },
  });
  
  if (status.status === "completed") {
    const result = await client.request({
        method: "tools/async/result",
        params: { token: asyncResponse.operation.token },
    });
    
    // (Not shown) Send the result to the conversation...
    
    break;
  }
  await sleep(1000 * asyncResponse.operation.pollFrequency ?? 1);
}
```

### Error Handling

Servers **MUST** return standard JSON-RPC errors for the following protocol error cases:

- Long-running operation called by an outdated client: `-32600` (Invalid request)
- Long-running operation called with an invalid or nonexistent `token`: `-32602` (Invalid params)
- `tools/call` called with a `token` that was already used to start a different operation: `-32602` (Invalid params)
- Internal errors: `-32603` (Internal error)


Servers **SHOULD** provide an informative error message to describe the cause of any errors. For example, a server that fails to look up an operation by its ID **MAY** return the following error:

```typescript
{
  "jsonrpc": "2.0",
  "id": 70,
  "error": {
    "code": -32602,
    "message": "Failed to retrieve operation: Operation not found"
  }
}
```

Likewise, a server that attempts to retrieve an expired operation MAY return the following error:

```typescript
{
  "jsonrpc": "2.0",
  "id": 70,
  "error": {
    "code": -32602,
    "message": "Failed to retrieve operation: Operation has expired"
  }
}
```

Note that servers are not obligated to retain operation metadata indefinitely, and it is compliant behavior for a server to return a “not-found” error should it purge an expired operation.

## Rationale

### Design Decision: Extend Existing RPC Call

The decision to extend the existing RPC call rather than introducing a new, dedicated tool call type was made for the following key reasons:

**API Clarity:**

- Maintaining a single `tools/call` RPC call maintains a single abstraction for all tool calls, regardless of invocation mode
- Avoids divergence between non-LRO and LRO tool call data models

**Host Application Simplicity:**

- Automatically filtering out LRO tools for old client versions enables older clients to continue using servers without errors
- Maintaining the same data models enables retaining existing tool call handling paths rather than building out parallel ones
- Predictable workflow of tool call and polling can be wrapped in a convenience method by clients for simpler host application integration

### Alternative Designs Considered

**Webhooks:**

MCP could provide a way for clients to pass a webhook to the server, which would receive notifications and the tool result asynchronously. This would be especially useful for tasks with a distant time horizon, on the order of hours or longer.

**Rejected because:**

- For remote servers or containerized local servers, this potentially requires a public endpoint for the MCP server to invoke. This is nontrivial to implement securely, especially in a desktop host application.
- This approach may be pursued in a separate proposal as a specialized approach suitable for very-long-running operations.

**Application-Level Job IDs:**

Server developers can implement similar functionality on top of synchronous tool calls today:

```typescript
// immediately return after tool call
CallToolResult { jobId: "00000000-0000-0000-0000-000000000000" }

// model polls in chat session
CallTool { name: "tool", arguments: { jobId: "00000000-0000-0000-0000-000000000000" } }
```

**Rejected because:**

- This is a common use case that many MCP servers require and are already duplicating effort to support (see customer use cases above). Server implementers want standardized approaches for supporting this to avoid redundant and error-prone efforts.
- This relies heavily on inconsistent model behavior and prompt engineering to decide when and if to poll at all, as opposed to having a deterministic, protocol-defined construct for asynchronous calls. With the proposed solution, host applications will have a reliable mechanism for immediately backgrounding long-running tool calls and waking the model up later to handle their results.
- Client-configured timeouts and timeouts in network gateways can interrupt legitimate long-running operations before completion.

## Backward Compatibility

This SEP introduces no backward incompatibilities. All existing MCP functionality remains completely unchanged:

### Compatibility Guarantees

- **Existing methods unchanged**: `tools/call`, `tools/list`, and all other existing methods continue to work exactly as before
- **Tool definitions backward compatible**: The optional `async` field is additive - existing tool definitions work unchanged (controlled by protocol version negotiation)
- **Client compatibility**: Existing clients continue to work without any modifications
- **Server compatibility**: Servers can implement async support incrementally without affecting existing functionality

**Key compatibility aspects:**

- **Version detection**: Servers detect client async support via protocol version or feature flags
- **Automatically filtered tool lists**: Old clients can only see tools that support sync execution — SDKs may expose methods to allow an async tool to be invoked synchronously if server owners wish to support this
- **Graceful degradation**: Old clients continue with existing sync functionality in sync-only and hybrid tools
- **No breaking changes**: Existing tools and workflows remain unaffected, and the `async` field is not included for clients on old versions
- **Gradual adoption**: New clients see enhanced tool capabilities when ready

### Client Experience by Version

```typescript
Old Clients (pre-async support):
// tools/list response (filtered)
{
  tools: [
    // only supports sync execution
    { name: "search_web", description: "Search the web" },
    // only supports sync execution
    { name: "quick_calc", description: "Fast calculation" },
    // supports both sync and async - invocationMode field is hidden from old clients
    { name: "get_weather", description: "Get weather" }
    // async-capable tools hidden from old clients
  ]
}

New Clients (async support):
// tools/list response (complete)
{
  tools: [
    // explicitly only supports sync execution
    { name: "search_web", description: "Search the web", invocationMode: "sync" },
    // implicitly only supports sync execution
    { name: "quick_calc", description: "Fast calculation" },
    // new clients see the async invocation mode and should assume async-only execution
    { name: "get_weather", description: "Get weather", invocationMode: "async" },
    // tool is async-only, and is not shown to old clients
    { name: "deep_analysis", description: "Complex analysis", invocationMode: "async" }
    // All tools visible, async capabilities declared
  ]
}
```

## Future Work

This async tool execution foundation is designed to evolve naturally into more sophisticated workflow capabilities. The following extensions are planned for future SEPs:

### Task-Based Workflows

The token-based async execution can be extended to support multi-operation workflows through task coordination:

### Server-Initiated Multi-Agent Flows

The task infrastructure enables server-initiated workflows for multi-agent scenarios:

```typescript
// Future: Server creates task for multi-agent workflow
async function handleExternalEvent(event, connectedClient) {
  const taskId = await server.createTask("Process Support Ticket");

  // Server executes operations within task
  const analyzeToken = await server.callToolAsync("analyze_ticket", {
    ticket: event.ticket,
    _meta: { taskId },
  });

  const classifyToken = await server.callToolAsync("classify_urgency", {
    ticket: event.ticket,
    _meta: { taskId },
  });

  // Notify client about server-initiated workflow, providing tokens
  // for individual steps
  connectedClient.notify("tasks/created", {
    taskId,
    initiator: "server",
    operations: [analyzeToken, classifyToken],
  });
}
```

### Task Management Methods

Future task-based workflows would introduce additional methods for client monitoring:

```typescript
// Task status monitoring (client-facing)
interface GetTaskStatusRequest {
  method: "tasks/status";
  params: { taskId: string };
}

interface GetTaskStatusResult {
  taskId: string;
  status: "active" | "completed" | "failed";
  operations: {
    token: string;
    name: string;
    status: "pending" | "running" | "completed" | "failed";
    dependencies: string[];
  }[];
  progress: { completed: number; total: number };
}

// Task notifications (server-initiated)
interface TaskCreatedNotification {
  method: "notifications/tasks/created";
  params: {
    taskId: string;
    initiator: "server";
    description?: string;
    operations: string[]; // Array of operation tokens
  };
}

interface TaskCompletedNotification {
  method: "notifications/tasks/completed";
  params: {
    taskId: string;
    status: "completed" | "failed";
    finalResults?: any;
  };
}
```

### Benefits of This Foundation

By establishing async tool execution first, future task-based workflows will benefit from:

- Proven async execution patterns - token management, polling, result retrieval
- Resource management - cleanup, expiration, and resource limits
- Client library support - existing async infrastructure can be extended

This foundation ensures that complex workflow capabilities can be added without disrupting the simple async execution patterns that most applications need.

### Triggers and Event-Driven Workflows

Beyond async execution and task-based workflows, the MCP ecosystem will benefit from **Triggers** - a webhook-like mechanism for event-driven tool execution. While async execution addresses concurrency within persistent connections, Triggers will enable:

- **Event-driven activation**: Tools that respond to external events without requiring persistent client connections
- **Scheduled execution**: Time-based tool activation for monitoring, maintenance, and periodic tasks
- **Cross-session workflows**: Operations that span multiple client sessions or survive client disconnections

The async foundation established in this SEP provides the execution primitives that Triggers will leverage for background operation management.

SEP-1391: Long-Running Operations #1391

Description

Preamble

Abstract

Motivation

Why Host-Level Solutions Are Insufficient:

Customer Use Cases Requiring Long-Running Operation Support

1. Healthcare & Life Sciences Data Analysis

2. Enterprise Automation Platforms

3. Code Migration Workflows

4. Test Execution Platforms

5. Deep Research

6. Agent-to-Agent Communication (Multi-Agent Systems)

Integration with Existing Architectures

Benefits for Existing Async Systems:

Specification

Core Approach: Extend Existing Tool Call with Token-Based Tracking

Modified Base Request

Modified Base Notification

Modified Base Result

Modified Tool Call Request

Response Types

Modified Tool Call Response

Async Operation Management

Status Checking

Result Retrieval

Enhanced Tool Definitions

Tool Examples

Execution Flow

Example Usage

Synchronous Execution (Existing Behavior)

Asynchronous Execution (New)

Error Handling

Rationale

Design Decision: Extend Existing RPC Call

Alternative Designs Considered

Backward Compatibility

Compatibility Guarantees

Client Experience by Version

Future Work

Task-Based Workflows

Server-Initiated Multi-Agent Flows

Task Management Methods

Benefits of This Foundation

Triggers and Event-Driven Workflows

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions