SEP-1686: Tasks

# SEP-1686: Tasks

## Preamble

**Title:** Tasks
**Author:** Surbhi Bansal, Luca Chang
**Status:** Accepted
**Type:** Standards Track
**Created:** 2025-10-20

## Abstract

This SEP improves support for task-based workflows in the Model Context Protocol (MCP). It introduces both the **task primitive** and the associated **task ID**, which can be used to query the state and results of a task, up to a server-defined duration after the task has completed. This primitive is designed to augment other requests (such as tool calls) to enable call-now, fetch-later execution patterns across all requests for servers that support this primitive.

## Motivation

The current MCP specification supports tool calls that execute a request and eventually receive a response, and tool calls can be passed a progress token to integrate with MCP’s progress-tracking functionality, enabling host applications to receive status updates for a tool call via notifications. However, there is no way for a client to explicitly request the status of a tool call, resulting in states where it is possible for a tool call to have been dropped on the server, and it is unknown if a response or a notification may ever arrive. Similarly, there is no way for a client to explicitly retrieve the result of a tool call after it has completed — if the result was dropped, clients must call the tool again, which is undesirable for tools expected to take minutes or more. This is particularly relevant for MCP servers abstracting existing workflow-based APIs, such as AWS Step Functions, Workflows for Google Cloud, or APIs representing CI/CD pipelines, among other applications.

Today, it is possible for individual MCP servers to represent tools in a way that enables this, with certain compromises. For example, a server may expose a `long_running_tool` and wish to support this pattern, splitting it into three separate tools to accommodate this:

1. `start_long_running_tool`: This would start the work represented by `long_running_tool` and return a tracking token of some kind, such as a job ID.
2. `get_long_running_tool_status(token)`: This would accept the tracking token and return the current status of the tool call, informing the caller that the operation is still ongoing.
3. `get_long_running_tool_result(token)`: This would accept the tracking token and return the result of the tool call, if it is available.

Representing a tool in this way seems to solve for the use case, but it introduces a new problem: Tools are generally-expected to be orchestrated by an agent, and agent-driven polling is both unnecessarily expensive and inconsistent — it relies on prompt engineering to steer an agent to poll at all. In the original `long_running_tool` case, the client had no way of knowing if a response would ever be received, while in the `start_long_running_tool` case, the application has no way of knowing if the agent will orchestrate tools according to the specific contract of the server.

It is also impossible for the host application to take ownership of this orchestration, as this tool-splitting is both conventions-based and may be implemented in different ways across MCP servers — one server may have three tools for one conceptual operation (as in our example), or it may have more, in the case of more complex, multi-step operations.

On the other hand, if active task polling is not needed, existing MCP servers can fully-wrap a workflow API in a single tool call that polls for a result, but this introduces an undesirable implementation cost: an MCP server wrapping an existing workflow API is a server that only exists for polling other systems.

**Affected Customer Use Cases**
These concerns are backed by real use cases that Amazon has seen both internally and with their external customers (identities redacted where non-public):

**1. Healthcare & Life Sciences Data Analysis**
**_Challenge:_** Amazon’s customers in the healthcare and life sciences industry are attempting to use MCP to wrap existing computational tools to analyze molecular properties and predict drug interactions, processing hundreds of thousands of data points per job from chemical libraries through multiple inference models simultaneously. These complex, multi-step workflows require a way to actively check statuses, as they take upwards of several hours, making retries undesirable.
**_Current Workaround:_** Not yet determined.
**_Impact:_** Cannot integrate with real-time research workflows, prevents interactive drug discovery platforms, and blocks automated research pipelines. These customers are looking for best practices for workflow-based tool calls and have noted the lack of first-class support in MCP as a concern. If these customers do not have a solution for long-running tool calls, they will likely forego MCP and continue using their existing platforms.
**_Ideal:_** Concurrent and poll-able tool calls as an answer for operations executing in the range of a few minutes, and some form of push notification system to avoid blocking their agents on long analyses on the order of hours. This SEP supports the former use case, and offers a framework that could extend to support the latter.

**2. Enterprise Automation Platforms**
**_Challenge:_** Amazon’s large enterprise customers are looking to develop internal MCP platforms to automate SDLC processes across their organizations, extending to sales, customer service, legal, HR, and cross-divisional teams. They have noted they have long-running agent and agent-tool interactions, supporting complex business process automation.
**_Current Workaround:_** Not yet determined. Considering an application-level system outside of MCP backed by webhooks.
**_Impact:_** Limitations related to the host application being unaware of tool execution state prevent complex business process automation and limit sophisticated multi-step operations. These customers want to dispatch processes concurrently and collect their results later, and are noting the lack of explicit late-retrieval as a concern — and are considering involved application-level notification systems as a possible workaround.
**_Ideal:_** Built-in mechanisms for actively checking the status of ongoing work to avoid needing to implement notification systems specific to their own tool conventions themselves.

**3. Code Migration Workflows**
**_Challenge_:** Amazon has automated code migration and transformation tools to perform upgrades across its own codebases and those of external customers, and is attempting to wrap those tools in MCP servers. These migrations analyze dependencies, transform code to avoid deprecated runtime features, and validate changes across multiple repositories. These migrations range from minutes to hours depending on migration scope, complexity, and validation requirements.
**_Current Workaround:_** Developers implement manual tracking by splitting a job into `create` and `get` tools, forcing models to manage state and repeatedly poll for completion.
**_Impact:_** Poor developer experience due to needing to replicate this hand-rolled polling mechanism across many tools. One team had to debug an issue where the model would hallucinate job names if it hadn’t listed them first. Validating that this does not happen across many tools in a large toolset is time-consuming and error-prone.
**_Ideal:_** Support natively polling tool state at the data layer to support pushing a tool to the background and avoiding blocking other tasks in the chat session, while still supporting deterministic polling and result retrieval. The team needs the same pattern across many tools in their MCP servers, and wants a common solution across them, which this SEP directly supports.

**4. Test Execution Platforms**
**_Challenge:_** Amazon’s internal test infrastructure executes comprehensive test suites including thousands of cases, integration tests across services, and performance benchmarks. They have built an MCP server wrapping this existing infrastructure.
**_Current Workaround:_** For streaming test logs, the MCP server exposes a tool that can read a range of log lines, as it cannot effectively notify the client when the execution is complete. There is not yet any workaround for executing test runs.
**_Impact:_** Cannot run a test suite and stream its logs simultaneously without a single hours-long tool call, which would time out on either the client or the server. This prevents agents from looking into test failures in an incomplete test run until the entire test suite has completed, potentially hours later.
**_Ideal:_** Support host application-driven tool polling for intermediate results, so a client can be notified when a long-running tool is complete. This SEP does not fully-support this use case (it does enable polling), but the Task execution model can be extended to do so, as discussed in the “Future Work” section.

**5. Deep Research**
**_Challenge:_** Deep research tools spawn multiple research agents to gather and summarize information about topics, going through several rounds of search and conversation turns internally to produce a final result for the caller application. The tool takes an extended amount of time to execute, and it is not always clear if the tool is still executing.
**_Current Workaround:_** The research tool is split into a separate `create` tool to create a report job and a `get` tool to get the status/result of that job later.
**_Impact:_** When using this with host applications, the agent sometimes runs into issues calling the `get` tool repeatedly — in particular, it calls the tool once before ending its conversation turn, claiming to be "waiting" before calling the tool again. It cannot resume until receiving a new user message. This also complicates expiration times, as it is not possible to predict when the client will retrieve the result when this occurs. It is possible to work around this by adding a `wait` tool for the model, but this prevents the model from doing anything else concurrently.
**_Ideal:_** Support polling a tool call’s state in a deterministic way and notify the model when a result is ready, so the tool result can be immediately retrieved and deleted from the server. Other than notifying the model (a host application concern), this SEP fully supports this use case.

**6. Agent-to-Agent Communication (Multi-Agent Systems)**
**_Challenge:_** One of Amazon’s internal multi-agent systems for customer question answering faces scenarios where agents require significant processing time for complex reasoning, research, or analysis. When agents communicate through MCP, slow agents cause cascading delays throughout this system, as agents are forced to wait on their peers to complete their work.
**_Current Workaround:_** Not yet determined.
**_Impact:_** Communication pattern creates cascading delays, prevents parallel agent processing, and degrades system responsiveness for other time-sensitive interactions.
**_Ideal:_** Some method to allow agents to perform other work concurrently and get notified once long-running tasks complete. This SEP supports this use case by enabling host applications to implement background polling for select tool calls without blocking agents.

These use cases demonstrate that a mechanism to actively track tool calls and defer results is a real requirement for these types of MCP deployments in production environments.

**Integration with Existing Architectures**
Many workflow-driven systems already provide active execution-tracking capabilities with built-in status metadata, monitoring, and data retention policies. This proposal enables MCP servers to expose these existing APIs with thin MCP wrappers while maintaining their existing reliability.

**Benefits for Existing Architectures:**

- **Leverage Existing State Management:** Systems like AWS Step Functions, Workflows for Google Cloud, and CI/CD platforms already maintain execution state, logs, and results. MCP servers can expose these systems' existing APIs without pushing the responsibility of polling to a fallible agent.
- **Preserve Native Monitoring:** Existing monitoring, alerting, and observability tools continue to work unchanged. The execution happens almost entirely within the existing workflow-management system.
- **Reduce Implementation Overhead:** Server implementers don't need to build new state management, persistence, or monitoring infrastructure. They can focus on the MCP protocol mapping of their existing APIs to tasks.

This SEP simplifies integration with existing workflows and allows workflow services to continue to manage their own state while delivering a quality customer experience, rather than offloading to agent-polling or building MCP servers that do nothing but poll other services.

## Specification

**Please refer to https://github.com/modelcontextprotocol/modelcontextprotocol/pull/1732 for the full specification text.**

Tasks are **durable state machines** that enable workflow-based operations in the Model Context Protocol (MCP). They allow requestors to augment requests with task metadata for deferred result retrieval via polling. Tasks are **bidirectional**: depending on capabilities negotiated during initialization, either clients or servers can create tasks by augmenting their requests. For example, a client might create a task when calling a long-running tool on a server, while a server might create a task when requesting sampling from a client.

During initialization, both parties declare their task support through capabilities. The `capabilities.tasks.requests` field is structured by request category (such as `tools.call`, `sampling.createMessage`, or `elicitation.create`). Requestors must only augment requests with task metadata if the receiver has declared support for that specific request type. This ensures both parties understand when task-based execution is available.

### Creating a Task

To create a task, a requestor sends a request with a `task` field included in the parameters. This field may contain a `ttl` value (in milliseconds) representing how long the requestor would like the task to be retained after creation. When the receiver accepts the task-augmented request, it responds immediately with a `CreateTaskResult` containing task metadata rather than the actual operation result.

The `CreateTaskResult` includes several key pieces of information: a unique `taskId` generated by the receiver, the current `status` (which begins as `working`), a `createdAt` timestamp in ISO 8601 format, the actual `ttl` duration the receiver will honor (which may differ from the requested value), and optionally a `pollInterval` suggesting how frequently the requestor should check for status updates.

**Example task-augmented tool call**

```json
// Request
{
 "jsonrpc": "2.0",
 "id": 1,
 "method": "tools/call",
 "params": {
 "name": "get_weather",
 "arguments": {
 "city": "New York"
 },
 "task": {
 "ttl": 60000
 }
 }
}

// Response
{
 "jsonrpc": "2.0",
 "id": 1,
 "result": {
 "task": {
 "taskId": "786512e2-9e0d-44bd-8f29-789f320fe840",
 "status": "working",
 "statusMessage": "The operation is now in progress.",
 "createdAt": "2025-11-25T10:30:00Z",
 "lastUpdatedAt": "2025-11-25T10:30:00Z",
 "ttl": 60000,
 "pollInterval": 5000
 }
 }
}
```

This two-phase response pattern distinguishes task-augmented requests from normal requests. In a normal request, the receiver processes the operation and returns the actual result directly, while in a task-augmented request, the receiver accepts the request and returns task metadata immediately, allowing the requestor to continue with other work while the operation executes in the background.

### Polling on a Task

After creating a task, the requestor polls for status updates using the `tasks/get` operation. This operation takes the `taskId` and returns the current task state, including the status, any status message, and timing information. The receiver may include a `pollInterval` in the response to suggest how long the requestor should wait before polling again.

Tasks can be in one of five states:

- `working`: The request is currently being processed (initial state)
- `input_required`: The receiver needs input from the requestor before continuing
- `completed`: The request completed successfully and results are available
- `failed`: The request did not complete successfully
- `cancelled`: The request was cancelled before completion

The last three states (`completed`, `failed`, and `cancelled`) are terminal—once a task reaches one of these states, it cannot transition to any other state. Tasks in the `working` state can transition to any of the other states, while tasks in `input_required` can transition back to `working` or forward to a terminal state.

The `input_required` status gets special attention. When a task enters this state, it means the receiver needs to send requests back to the requestor (such as elicitations for user input) before the task can continue. The requestor should call `tasks/result` when it encounters this status, even though the task hasn't reached a terminal state. The receiver will use this opportunity to send its requests, and once it receives the necessary responses, the task typically transitions back to `working`.

### Getting Task Results

Once a task reaches a terminal state, the requestor retrieves the actual operation result using the `tasks/result` operation. The `CreateTaskResult` returned when the task was created contained only metadata about the task itself, while `tasks/result` returns what the original request would have returned if it hadn't been task-augmented.

If the requestor calls `tasks/result` while the task is still in a non-terminal state, the call blocks until the task reaches a terminal state. This blocking behavior enables the `input_required` pattern described earlier—the requestor can call `tasks/result` prematurely when it sees that status, and the receiver will use that open connection to send its requests.

For completed tasks, `tasks/result` returns the successful result exactly as the underlying operation would have returned it. For failed tasks, it returns the JSON-RPC error that the operation encountered. For tool calls specifically, if the tool result has `isError` set to true, the task moves to the `failed` status and `tasks/result` returns that tool result.

**Example request flow**

```mermaid
sequenceDiagram
 participant Client as Client (Requestor)
 participant Server as Server (Receiver)

 Note over Client,Server: 1. Task Creation
 Client->>Server: tools/call with task metadata
 Server->>Client: CreateTaskResult (taskId, status: working)
 Note over Client: Client continues other work

 Note over Client,Server: 2. Polling
 Client->>Server: tasks/get (taskId)
 Server->>Client: status: working
 Note over Server: Processing continues...
 Client->>Server: tasks/get (taskId)
 Server->>Client: status: working
 Note over Server: Task completes
 Client->>Server: tasks/get (taskId)
 Server->>Client: status: completed

 Note over Client,Server: 3. Result Retrieval
 Client->>Server: tasks/result (taskId)
 Server->>Client: Actual tool call result
```

### Additional Operations

Beyond the core create-poll-retrieve cycle, the tasks specification defines two additional operations. The `tasks/list` operation allows requestors to retrieve a paginated list of all their tasks, using cursor-based pagination with opaque cursor tokens. The `tasks/cancel` operation allows requestors to cancel a task that hasn't yet reached a terminal state. When a task is cancelled, it transitions to the `cancelled` status permanently—even if the underlying operation continues to completion, the task remains marked as cancelled.

Receivers may also send optional `notifications/tasks/status` notifications when a task's status changes. However, requestors must not rely on receiving these notifications, as they are entirely optional. Requestors should continue polling via `tasks/get` to ensure they receive status updates.

### Resource Management and Security

Tasks include several features for resource management. Each task has a time-to-live (TTL) duration measured from the `createdAt` timestamp. After this duration elapses, the receiver may delete the task and its results, regardless of the task's current status. Receivers may override the requestor's requested TTL with their own limits to prevent indefinite resource retention.

For security, receivers should bind tasks to the session or authentication context that created them, preventing unauthorized access to task state and results. In environments without session management (such as single-user tools), receivers should document this limitation and use cryptographically random task IDs with sufficient entropy to make guessing infeasible. All implementations should implement rate limiting on task operations to prevent denial-of-service attacks.

All requests, responses, and notifications associated with a task must include `io.modelcontextprotocol/related-task` metadata in their `_meta` field, with the `taskId` of the associated task. This allows implementations to track related messages across the entire task lifecycle. There are a few exceptions: the task management operations themselves (`tasks/get`, `tasks/list`, `tasks/cancel`) should not include this metadata in their request parameters since the task ID is already explicitly provided, though `tasks/result` must include it in the response since the result structure itself doesn't contain the task ID.

## Rationale

### Design Decision: Generic Task Primitive

The decision to implement tasks as a generic request augmentation mechanism (rather than tool-specific or method-specific) was made to maximize protocol simplicity and flexibility.

Tasks are designed to work with any request type in the MCP protocol, not just tool calls. From a protocol perspective, this design eliminates the need for separate task implementations per request type. Instead of defining different async patterns for each request type, a single set of task management methods (`tasks/get`, `tasks/result`, `tasks/list`, `tasks/cancel`) works uniformly across all supported request types. This uniformity reduces cognitive load for implementers and creates a consistent experience for applications using the protocol.

**Current scope:** While the task primitive is designed to support any request type in principle, the current specification restricts task augmentation to three request types: `tools/call`, `sampling/createMessage`, and `elicitation/create`. This restriction exists to ease the burden on SDK developers. Introducing `CreateTaskResult` as a response type means SDKs need to provide new request methods (or method variants) for each task-augmentable request type, which would be disruptive if applied broadly. By limiting the initial scope to high-value request types where long-running execution is most relevant, we reduce the SDK implementation burden while establishing the pattern for future expansion.

The generic design provides implementation flexibility within this scope. Servers and clients can choose which of the supported requests to enable task augmentation for by declaring appropriate capabilities. This allows parties to add task support incrementally based on actual usage patterns.

Architecturally, tasks augment existing requests rather than replacing them. The original request/response flow is modified—instead of returning the operation result directly, a task-augmented request returns a `CreateTaskResult` containing task metadata. The actual operation result becomes available through `tasks/result` after the task completes. This two-phase response pattern enables call-now, fetch-later execution while maintaining clear semantics for all parties.

### Design Decision: Explicit Capability Negotiation

Tasks require explicit capability declaration during initialization, unlike a previous design iteration that relied on implicit detection. This decision was made to provide clarity and deterministic behavior for all parties.

During initialization, both clients and servers declare their task support through the `capabilities.tasks` field, structured by request category (such as `tasks.requests.tools.call` or `tasks.requests.sampling.createMessage`). This explicit declaration provides several benefits:

1. **Clear contract establishment**: Both parties know upfront which request types support task augmentation. Requestors can make informed decisions about whether to use task augmentation without trial-and-error.

2. **Granular control**: The capability structure allows fine-grained declaration of task support. A server might support task-augmented tool calls but not other request types, and this is clearly communicated during initialization.

3. **Additional operation support**: The `tasks.list` and `tasks.cancel` capabilities indicate whether the receiver supports these optional operations, allowing requestors to adapt their behavior accordingly.

4. **Tool-level granularity**: For tool calls specifically, the `execution.taskSupport` field on individual tools provides even finer control, allowing tools to declare whether task augmentation is `"required"`, `"optional"`, or `"forbidden"` (the default). This is layered on top of the server-level capability.

This explicit approach trades some flexibility for predictability. Requestors know before sending a request whether task augmentation is supported, eliminating the need for fallback handling or error recovery when task support is unavailable.

### Design Decision: Receiver-Generated Task IDs

The choice to have receivers generate task IDs rather than requestors provides alignment with existing workflow systems and simplifies the protocol.

**Integration with existing systems:**
Many workflow-driven systems that MCP servers wrap already generate their own identifiers—AWS Step Functions execution ARNs, CI/CD pipeline run IDs, and similar. By having receivers generate task IDs, servers can directly use these existing identifiers (or derive task IDs from them), eliminating the need for mapping between requestor-provided IDs and internal identifiers.

**Simplified protocol semantics:**
With receiver-generated IDs, the response to a task-augmented request is a `CreateTaskResult` that includes the assigned `taskId`. The requestor then uses this ID for all subsequent task operations. This creates a clear handoff: the receiver owns task identity from the moment of creation.

**Trade-offs for fault tolerance:**
The main trade-off is that requestors cannot know the task ID before receiving the response. If the response is lost (network failure, timeout), the requestor cannot deterministically retry without potentially creating duplicate tasks. However, several mitigations exist:

- Requestors can use `tasks/list` to discover tasks they may have created
- Receivers can implement deduplication based on request parameters if needed
- Transport-level reliability mechanisms (such as those in Streamable HTTP) can reduce the likelihood of lost responses

Importantly, the need for idempotency is not unique to tasks—it applies to all MCP messages. Rather than solving idempotency specifically for task creation, it was agreed that a dedicated proposal should introduce a general mechanism for message idempotency across the protocol. This allows the task design to remain focused on execution tracking while deferring the broader idempotency concern to a more appropriate scope.

This trade-off was deemed acceptable because the simplification benefits outweigh the complexity of requestor-generated IDs, and the failure scenarios are manageable through existing mechanisms pending a general idempotency solution.

### Design Decision: Synchronous Task Creation Response

The decision to return task metadata directly in the response (via `CreateTaskResult`) rather than using a separate notification was made to simplify the protocol and improve reliability.

When a receiver accepts a task-augmented request, it immediately returns a `CreateTaskResult` containing the task ID, initial status (`working`), timestamps, TTL, and optional poll interval. This synchronous approach provides several benefits:

1. **Deterministic task discovery**: The requestor knows the task ID as soon as the response arrives. There's no race condition between receiving a notification and being able to poll for status.

2. **Transport independence**: The design works identically across all transports, including those without notification support. Requestors don't need special handling for environments where notifications may be unavailable or unreliable.

3. **Simpler implementation**: Both requestors and receivers have straightforward request/response semantics. Receivers don't need to manage notification delivery, and requestors don't need to listen for notifications while also waiting for the response.

4. **Clear response semantics**: The two-phase pattern is explicit—a task-augmented request returns task metadata, not the operation result. This distinguishes task-augmented requests from normal requests at the protocol level.

The notification-based approach from a previous design iteration had benefits for certain architectures (particularly fire-and-forget dispatch to background systems), but introduced complexity around notification delivery timing and graceful degradation. The synchronous approach trades some architectural flexibility for protocol simplicity and reliability.

### Design Decision: Request Parameters for Task Metadata

Task augmentation uses a dedicated `task` field in request parameters rather than placing information in the `_meta` field. This decision reflects the semantic significance of task augmentation.

Including `task` in request parameters signals that the requestor is explicitly requesting task-based execution, which fundamentally changes the response type. A request with `task` present returns a `CreateTaskResult` rather than the normal operation result. This is a semantic choice by the requestor, not merely metadata about how to track the request.

This approach provides several benefits:

1. **Explicit intent**: The presence of `task` in params is a clear signal that the requestor expects task-based execution. This is more explicit than metadata, which is typically optional and ignorable.

2. **Schema clarity**: The `task` field has a defined structure (`ttl` for requested duration). Placing this in params allows it to be part of the formal request schema, enabling validation and documentation.

3. **Response type determination**: Since `task` presence determines the response type, having it in params creates a clear contract: requests with `task` return `CreateTaskResult`, requests without `task` return the normal result.

The related task association for messages during task execution (`io.modelcontextprotocol/related-task`) is appropriately placed in `_meta`, as it is truly metadata that doesn't change the semantics of those messages—it simply associates them with a task for tracking purposes.

### Alternative Designs Considered

**Tool-Specific Async Execution:**
An earlier version of this proposal (#1391) focused specifically on tool calls, introducing an `invocationMode` field on tool definitions to mark tools as supporting synchronous, asynchronous, or both execution modes. This approach would have added dedicated fields to the tool call request and response structures, with server-side capability declarations to indicate support for async tool execution.

While this design would have addressed the immediate need for long-running tool calls, it was rejected in favor of the more general task primitive for several reasons. First, it artificially limited the async execution pattern to tools when other request types have similar needs. Resources can be expensive to read, prompts can require complex processing, and sampling requests may involve lengthy user interactions. Creating separate async patterns for each request type would lead to protocol fragmentation and inconsistent implementation patterns.

Second, the tool-specific approach required more complex capability negotiation and version handling. Servers would need to filter tool lists based on client capabilities, and SDKs would need to manage different invocation patterns for sync versus async tools. This complexity would ripple through every layer of the implementation stack.

Finally, the tool-specific design didn't address the broader architectural need for deferred result retrieval across all MCP request types. By generalizing to a task primitive that augments any request, this proposal provides a consistent pattern that can be applied uniformly across the protocol. More importantly, this foundation is extensible to future protocol messages and features such as subtasks, making it a more appropriate building block for the protocol's evolution.

**Transport-Layer Solutions:**
An alternative approach would be to solve for this purely at the transport layer, without introducing a new data-layer primitive. Several proposals (#1335, #1442, #1597) address transport-specific concerns such as connection resilience, request retry semantics, and stream management for sHTTP. These are valuable improvements that can mitigate many scaling and reliability challenges associated with requests that may take extended time to complete.

However, transport-layer solutions alone are insufficient for the use cases this SEP addresses. Even with perfect transport-layer reliability, several data-layer concerns remain:

First, servers and clients need a way to communicate expectations about execution patterns. Without this, host applications cannot make informed decisions about UX patterns—should they block, show a spinner, or allow the user to continue working? An annotation alone could signal that a request might take extended time, but provides no mechanism to actively check status or retrieve results later.

Second, transport-layer solutions cannot provide visibility into the execution state of a request that is still in progress. If a request stops sending progress notifications, the client cannot distinguish between "the server is doing expensive work" and "the request was lost." Transport-level retries can confirm the connection is alive, but cannot answer "is this specific request still executing?" This visibility is critical for operations where users need confidence their work is progressing.

Third, different transports would require different mechanisms for these concerns. The sHTTP proposals adjust stream management and retry semantics to fulfill these requirements, but stdio has no equivalent extension points. This creates transport-specific fragmentation where implementers must solve the same problems differently depending on their choice of transport. Data-layer operations provides consistent semantics across all transports.

Finally, deferred result retrieval and active status checks are data-layer concerns that cannot be addressed by transport improvements alone. The ability to retrieve a result multiple times, specify retention duration, and handle cleanup is orthogonal to how the underlying messages are delivered.

**Resource-Based Approaches:**
Another possible approach would be to leverage existing MCP resources for tracking long-running operations. For example, a tool could return a linked resource that communicates operation status, and clients could subscribe to that resource to receive updates when the operation completes. This would allow servers to represent task state using the resource primitive, potentially with annotations for suggested polling frequency.

While this approach is technically feasible and servers remain free to adopt such conventions, it suffers from similar limitations as the tool-splitting pattern described in the Motivation section. Like the `start_tool` and `get_tool` convention, a resource-based tracking system would be convention-based rather than standardized, creating several challenges:

The most fundamental issue is the lack of a consistent way for clients to distinguish between ordinary resources (meant to be exposed to models) and status-tracking resources (meant to be polled by the application). Should a status resource be presented to the model? How should the client correlate a returned resource with the original tool call? Without standardization, different servers would implement different conventions, forcing clients/hosts/models to handle each server's particular approach. Extending resources with task-like semantics (such as polling frequency, keepalive durations, and explicit status states) would create a new and distinct purpose for resources that would be difficult to distinguish from their existing purpose as model-accessible content.

The resource subscription model has one additional issue: as it is push-based, it requires clients to wait for notifications of resource changes rather than actively polling for status. While this works for some use cases, it doesn't address scenarios where clients need to actively check status—for example, proactively and deterministically checking if work is still progressing, which is the original intent of this proposal.

The task primitive addresses these concerns by providing a standardized, protocol-level mechanism specifically designed for this use case, with consistent semantics that any client can leverage without host applications needing to understand server-specific conventions. While resource-based tracking remains possible for servers that prefer it and/or are already using it, this SEP provides a first-class alternative that solves the broader set of requirements identified previously.

## Backward Compatibility

The tasks feature is designed to be fully backward compatible with existing MCP implementations. Task support is negotiated through explicit capability declarations during initialization. Requestors should only augment requests with task metadata if the receiver has declared support for that specific request type.

Receivers that do not declare task support for a request type must process requests of that type normally, ignoring any task-augmentation metadata if present. This ensures that mistakenly sent task metadata does not cause failures, maintaining backward compatibility with existing request flows.

Receivers that do declare task support have flexibility in how they handle non-task-augmented requests. They may continue to support the traditional immediate execution pattern for backward compatibility, or they may choose to reject non-task-augmented requests with an error, requiring all requestors to use task augmentation. For tool calls specifically, the `execution.taskSupport` field on individual tools provides fine-grained control: tools can declare task augmentation as `"required"`, `"optional"`, or `"forbidden"` (the default).

**Adoption Path:**

- Servers can implement task support incrementally, starting with high-value request types
- Clients can check capabilities during initialization and use tasks where supported
- The capability negotiation ensures both parties agree on task support before use

## Future Work

The task primitive introduced in this SEP provides a foundation for several important extensions that will enhance MCP's workflow capabilities.

### Push Notifications

While this SEP focuses on client-driven polling, future work could introduce server-initiated notifications for task state changes. This would be particularly valuable for operations that take hours or longer, where continuous polling becomes impractical.

A notification-based approach would allow servers to proactively inform clients when:

- A task completes or fails
- A task reaches a milestone or significant state transition
- A task requires input (complementing the `input_required` status)

This could be implemented through webhook-style mechanisms or persistent notification channels, depending on the transport capabilities. The proposed task ID and status model provides the necessary infrastructure for servers to identify which tasks warrant notifications and for clients to correlate notifications with their outstanding tasks.

### Intermediate Results

The current task model returns results only upon completion. Future extensions could enable tasks to report intermediate results or progress artifacts during execution. This would support use cases where servers can produce partial outputs before final completion, such as:

- Streaming analysis results as they become available
- Reporting completed phases of multi-step operations
- Providing preview data while full processing continues

Intermediate results would build on the proposed task ID association mechanism, allowing servers to send multiple result notifications or response messages tied to the same task ID throughout its lifecycle.

### Nested Task Execution

A significant future enhancement is support for hierarchical task relationships, where a task can spawn subtasks as part of its execution. This would enable complex, multi-step workflows orchestrated by the server.

In a nested task model, a server could:

- Create subtasks in response to a parent task reaching a state that requires additional operations
- Communicate subtask requirements to the client, potentially including required tool calls or sampling requests
- Track subtask completion and use subtask results to advance the parent task
- Maintain provenance through task ID hierarchies, showing the relationship between parent and child tasks

For example, a complex analysis task might spawn several subtasks for data gathering, each represented by its own task ID but associated with the parent task. The parent task would remain in a pending state (potentially in a new `tool_required` status) until all required subtasks complete.

This hierarchical model would support sophisticated server-controlled workflows while maintaining the client's ability to monitor and retrieve results at any level of the task tree.

<details>

<summary>Example nested task flow</summary>

```mermaid
sequenceDiagram
 participant C as Client
 participant S as Server

 Note over C,S: Client Creates Parent Task
 C->>S: tools/call "deploy_application" with task: {ttl: 3600000}
 S->>C: CreateTaskResult (taskId: "deploy-123")

 C->>S: tasks/get (taskId: "deploy-123")
 S->>C: status: working

 Note over S: Server determines subtasks needed

 Note over C,S: Server Responds with Subtask Requirements
 C->>S: tasks/get (taskId: "deploy-123")
 S->>C: status: working childTasks: [{ toolName: "run_build", arguments: {...} }, { toolName: "run_tests", arguments: {...} }]

 Note over C: Client initiates subtasks

 C->>S: tools/call "run_build" with task + related-task: "deploy-123"
 S->>C: CreateTaskResult (taskId: "build-456")

 C->>S: tools/call "run_tests" with task + related-task: "deploy-123"
 S->>C: CreateTaskResult (taskId: "test-789")

 Note over C: Client polls subtasks

 C->>S: tasks/get (taskId: "build-456")
 S->>C: status: completed

 C->>S: tasks/get (taskId: "test-789")
 S->>C: status: completed

 Note over S: All subtasks complete, parent continues

 C->>S: tasks/get (taskId: "deploy-123")
 S->>C: status: completed

 C->>S: tasks/result (taskId: "deploy-123")
 S->>C: Deployment complete
```

**Potential Data Model Extensions:**
The task status response could be extended to include parent and child task relationships:

```typescript
{
 taskId: string;
 status: TaskStatus;
 ttl: number | null;
 pollInterval?: number;
 statusMessage?: string;

 // Extensions for nested tasks
 parentTaskId?: string; // ID of parent task, if this is a subtask
 childTasks?: Array<{ // Subtasks required by this task
 toolName: string; // Tool to call for this subtask
 arguments?: object; // Arguments for the tool call
 }>;
}
```

This would allow clients to:

- Discover subtasks required by a parent task through the `childTasks` array
- Initiate the required subtask tool calls with task augmentation
- Navigate the task hierarchy by following parent/child relationships via `parentTaskId`
- Monitor all subtasks by polling each child task ID
- Wait for all subtasks to complete before checking parent task completion

The existing task metadata and status lifecycle are designed to be forward-compatible with these extensions.

</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SEP-1686: Tasks #1686

SEP-1686: Tasks

Preamble

Abstract

Motivation

Specification

Creating a Task

Polling on a Task

Getting Task Results

Additional Operations

Resource Management and Security

Rationale

Design Decision: Generic Task Primitive

Design Decision: Explicit Capability Negotiation

Design Decision: Receiver-Generated Task IDs

Design Decision: Synchronous Task Creation Response

Design Decision: Request Parameters for Task Metadata

Alternative Designs Considered

Backward Compatibility

Future Work

Push Notifications

Intermediate Results

Nested Task Execution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SEP-1686: Tasks #1686

Description

SEP-1686: Tasks

Preamble

Abstract

Motivation

Specification

Creating a Task

Polling on a Task

Getting Task Results

Additional Operations

Resource Management and Security

Rationale

Design Decision: Generic Task Primitive

Design Decision: Explicit Capability Negotiation

Design Decision: Receiver-Generated Task IDs

Design Decision: Synchronous Task Creation Response

Design Decision: Request Parameters for Task Metadata

Alternative Designs Considered

Backward Compatibility

Future Work

Push Notifications

Intermediate Results

Nested Task Execution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions