-
Notifications
You must be signed in to change notification settings - Fork 1.2k
SEP-1865: MCP Apps - Interactive User Interfaces for MCP #1865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Added documentation for optional extensions to the Model Context Protocol.
|
This is exciting to see! I see that as with current MCP-UI and Apps SDK specs, this covers allowing UI to request tool calls and rerender in a way that keeps the agent in the loop. But does this proposal intentionally not address mechanisms to close the loop in the other direction, to flow related tool call data from subsequent conversational turns back into an already rendered widget? Or is the ability to read (and subscribe to?) resources directly from within a widget intended for this purpose? Suppose I have a getItemDetails tool that renders a Book widget, and then in a subsequent turn a user utterance triggers a setItemStatus tool which mutates a status field. How should the change be communicated to the widget so it can rerender? |
|
That’s not an MCP limitation is it? (roll your own MCP and chat interface and there’s no issue rendering video or html or whatever), that is a UI of chat bots limitation. You don’t NEED to give the model back text data over MCP - a tool can be triggered and vend auth’d data of any kind to the interface. What am I missing? Why concretize a general communication/auth protocol spec within one specific usecase? |
|
Fantastic work on this PR.. really sharp update. Introducing this resource declarations and a bi-directional UI communication model feels like a big step toward unlocking richer and more interactive MCP clients. One question: how do you envision capability negotiation evolving for UI-enabled resources once multiple client types adopt this pattern? I'm curiosa whether you see a standardized handshake emerging or if it stays client-specific for now. |
In "base" MCP there is no way to distinguish between what is for the model and what is for the application. The OpenAI Apps SDK worked around this by putting the UI in structured output and the for-model result in regular unstructured / text tool output. But that's actually not standards-compliant (the structured and unstructured output is supposed to be the same as of current spec), and it prevents the use of structured output for other things. By using metadata it becomes much more explicit what is the tool result and what is the tool app/ui/visualization component. It is important to distinguish between context for the model and for the application. You don't want to send context for the application to the model, as the model doesn't have direct access to the UI (at least not without calling another tool, and that'd be a very roundabout way to accomplish the same thing). Also, this is an extension, so it is purely additive. It's a great way to let something that is bound to evolve and need continuous adjustments not get bogged down by only being allowed to change with spec versions. You're right that if you're rolling your own MCP server and client host, then you can already do this using whatever scheme you want - but the beauty of a standardized extension is that we have less risk of ending up with a unique UI / Apps contract per client host. Ideally as a server author, you'd want your MCP server to be able to render UI to all chat platforms without having to use a different communication convention for each of them. And simply returning the UI resource to the model is not a solution. The model is not an active participant in rendering the UI - that's an application-level concern. Finally, the whole in-frame messaging part of this extension is non-trivial to design and engineer, so having a standardized way to that is highly valuable. See: https://github.com/modelcontextprotocol/ext-apps/blob/main/specification/draft/apps.mdx#transport-layer |
|
Many questions: 1) Why bring this application implementation into the protocol itself?This, along with OpenAI's implementation of the Apps SDK, is not bounded by the MCP specification. @PederHP makes a good point:
Then I would say this should be rejected as a SEP (since it's not apart of the specification) and instead, documented as a "best practice" or "official extension". The docs site already has precedence for this in the roadmap: https://modelcontextprotocol.io/development/roadmap#official-extensions
I missed this https://github.com/modelcontextprotocol/ext-apps 🤦🏼 - this getting a SEP in 2) Spec bloatOne of my main worries with bringing this into the main spec is that we continue to over complicate an already bloated and bifurcated specification: I know of no other MCP servers that implement extensions and encouraging official support for this will make server implementors lives harder (vs. encouraging this as a best practice). I would point to the following discussions on how fractured the community is and the difficulty server builders / maintainers have had over the last year:
3) JSON RPC
|
|
Regarding
To me this seems like something that SHOULD require a user approval. Giving the server the ability to inject arbitrary messages into the conversation without user approval seems like a problematic pattern. I know there is an increased reliance on trusting the server, but there may also be non-malicious cases where the user simply does not want the inject message for whatever reason. |
@adamesque Great use case! Currently, the SEP doesn't explicitly address that flow, but it does support patterns that enable it. For example -
It's likely a common use case that requires clarification and guidance. However, I'm not sure that the MVP should enforce specific behavior at this point. We can definitely discuss it. |
@adriannoes Host<>UI capability negotiation is implemented in the SDK and mentioned in the spec as part of the We'd love to hear your feedback! I'll note that we need to review the internal structure and ensure it includes the fields we need for the MVP. |
@PederHP It came up, and it's definitely worth further discussion. I think it warrants a thread in #ui-cwg. |
|
@idosal I noticed in the proposal the use of the media type Media Type suffixes (+json, etc) are intended to communicate an underlying format that the new media type is based on. e.g. image/svg+xml indicates that content can be processed as xml. If in the future there was a decision to try and register this media type, it is highly unlikely that the registration would be allowed. I say this as one of IANA's media type reviewers. However, I think I have a proposal that would address your needs and only slightly bend the rules. This specification https://datatracker.ietf.org/doc/html/rfc6906#section-3.1 (which is just Informational so doesn't carry any IETF approval) proposes the use of the My suggestion is to use this:
Media type parameters are a commonly used construct. HTML technically only allows the charset parameter, but with some creative license, adding the profile parameter is not likely to cause any problems. From rfc6909,
This is ideal because it allows the content to still be treated like text/html but you have the clear indicator that the HTML is intended to be processed using MCP semantics. |
|
Should the profile maybe be |
|
Agent-Driven UI Navigation Question: Would it make sense to document "UI control" patterns alongside MCP Apps? Many complex applications (network visualization, CAD tools, enterprise dashboards, ...) might benefit from agent-guided navigation beyond embedded widgets. I believe the two patterns are complementary. Background:
Potential synergy:
This combines MCP Apps' inline interactivity with full application depth. Widgets act as gateways to rich, stateful exploration. |
|
@glen-84 I don't have much visibility into what the range of values could be for the profile. I think mcp-app is fine too. |
|
Thanks for the thoughtful feedback, @darrelmiller! Really appreciate the insight, especially given your experience with IANA. I agree that The Does that tradeoff make sense given the use case, or do you think the profile approach is still worth it? |
|
It would be very useful to have a way to push context to the host application without necessarily triggering a user message. Something like Consider an MCP App that tracks some activity, like a build/release dashboard. We want the app to be able to push user interactions (like when the user triggers a new build or deployment via the UI) or when state changes (a deployment changed from in-progress to done). Doing this with By adding a sort of context buffer for the server / ui to push to we leave it up to the host how to handle these concerns, which I think is a good pattern, and allows this extension to be useful in a variety of contexts from autonomous agents to conversational AI on a variety of device form factors. I'll open a thread on Discord for this as well. |
|
@antonpk1 I am contractually bound to say you should not use a media type that is not registered and is structurally invalid. :-) However, I do understand your concern over adding unnecessary complexity. There are two mitigations: One is that there are lots of parsing libraries that do make some of that normalization go away. Especially now that Structured Fields are a thing. https://www.rfc-editor.org/rfc/rfc9651.html The other is that you are free to mandate that people use the exact string You do what you think is right for your community, but I will say that from experience, complying with existing standards does generally provide long term benefits, especially when the cost to do so is low. |
@idosal I think it's worth a discussion — in my mind, without explicitly addressing this, we're not able to "close the agentic loop" for UI, where agents that render UI can not only see the information presented but collaborate on / assist with it. We've seen the need for more formal patterns around this at Indeed.
Agree that more structure would be helpful b/c as currently written I don't believe this spec is clear enough around subsequent host-initiated update mechanisms — if it's permissible for the host to supply tool-result updates not- requested by the UI during the interactive phase, it would be good to include it there. It's possible some sort of widget or resource key should be returned in tool result meta if tool call data is intended to be merged into an existing widget.
I think unless the intent is to poll (or subscribe), the spec doesn't describe the Host -> Guest events that should trigger these updates to ensure the UI stays in sync with model data as conversation-initiated tool calls occur. Generally I would prefer other mechanisms than a "please refetch" message. One other piece that has come up in internal discussions at Indeed is that this spec doesn't provide a mechanism similar to Apps SDK widget state. Without this, it's unclear how a guest UI can communicate:
The first could probably be achieved via a tool call and might only need a recommendation, but the second seems fairly important since the spec does provide for interactive phase UI-initiated tool calls that can result in UI updates. Currently a widget would have to wait for both initialization and at a minimum ui/tool-input to then make a request to its own backend to get a last-saved state snapshot (if one exists). Specifying such a backend is well outside the scope of a spec like this but feels some part should discuss the reload flow. Otherwise I think it's likely unsophisticated implementors will build out-of-the-box broken experiences. Finally, the spec includes displayMode in the HostContext interface but doesn't define a guest -> host message to request a different displayMode. Is that an intentional omission? Thanks! |
@tschei-di exactly - this is what's I do personally like the approach of an application that "cooperates" rather than being automated. |
Thanks @tschei-di, that cleared things up! To recap -
|
|
Hello, First, thank you for making this SEP and all the work you put in to make this an official extension of MCP! After reading the full specification, also had a few remarks:
The OpenAI WidgetState implementation is solving both issues (intentionally or not) as it is passing all of the widgetState information to the llm AND it is also allows the user to come back to the conversation after a page reload or on another device and see the same UI rendered as he was seeing as when he left the conversation.
PS: I see a few issues opened in ext-apps repository, should the conversation move there? |
Thanks @rodincave!
Generally, this PR is good for surfacing general feedback, with issues that need further discussion or tracking moving to ext-apps/Discord. |
|
Thanks for the work on this SEP! Our organization is integrating agents with a chat platform via MCP and we've identified several UI component patterns that would benefit from standardization in the MCP Apps specification. Based on the discussion above, I see some of our needs align with existing threads, while others appear to be gaps. Here are our requirements:
|
|
Hi all, I'm one of the authors of the AG-UI protocol. I’m currently adding MCP Apps support to AG-UI and CopilotKit. The approach is a small middleware layer that lets AG-UI-based agent frameworks consume MCP Apps directly, and handles proxying the MCP server communication on the client side. Early implementation is here: CopilotKit/vnext_experimental#48 |
|
Thanks for your impressive work! Here are my questions and suggestions:
In MCP Apps SEP, it's mentioned that a UI resource is associated with a specific Tool through
This implies a one-to-one relationship between the UI resource and the result of a tool call.
How do you suggest developers make their choices?
I believe that UI Resources need their own dedicated tool: UI Resource Tool. It would still be a standard MCP Tool that conforms to existing specifications, but its role would be unique: to return a specific UI resource along with the data required to render it. The LLM wouldn't need to be aware of the UI resource itself, but it would need to understand its purpose (from the description ) and its data dependencies (from the inputSchema, which is very important for data pre-validation). This would allow the Agent to autonomously call the tool and construct the data inputs, thereby maximizing its intelligent capabilities. If we only treat a UI Resource as a by-product of another tool, I believe the potential of the resource is severely limited. Introducing a dedicated class of UI Resource Tools offers another advantage: it promotes the reusability of UI components. For example, the results from both Furthermore, this would foster the emergence of general-purpose UI MCP Servers, such as common charting libraries or log viewers. Developers could simply add them to achieve stunning interactive effects on the client-side. The UI Resource would no longer be bound in a 1:1 relationship with a specific data-fetching tool.
If we take this a step further, a UI Resource is just one specific type of Resource . As the Agent ecosystem evolves, will clients require other types of Resources besides UI ones? I believe the answer is a definite yes. This leads to the question: should the extension we introduce be a UI Resource Tool , or should it be a more general, standardized, and extensible Resource Tool?
If we evolve the concept from a "Tool with a UI Resource" to a "Resource Tool," then our design should not be limited to just MCP Apps. We need to consider how to meet the potential needs of developers for "Resources" in other future scenarios. I believe the existing Tool type definition requires some adjustments. A Tool should be able to associate wit resource via a Finally, it should be noted that the "Resource Tool" is an additive extension, ensuring full backward compatibility with the current "Tool with Resource" model. Developers will have the flexibility to either create a new class of dedicated Resource Tools or bind a Resource to an existing tool; the two approaches are not mutually exclusive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
|
Hi team! 👋 We're implementing MCP Apps support and noticed a few things that might be inconsistencies (though we may be misunderstanding the intent): 1. This doesn't follow the 2. The spec says "Guest UI SHOULD send this notification..." (UI→Host), but other Nothing blocking and thanks for all the work on this! 🙏 |
Resources already have mimetypes: https://modelcontextprotocol.io/specification/2025-11-25/server/resources So that change introduces the risk of discrepancy between the one in the actual MCP resource and the one in the tool metadata. I don't see any good reason to duplicate mimeType out into the meta object. It is important to remember that these are MCP resource uris - not HTTP uris. I can see the idea of having multiple resources - but until that is actually needed I think it risks complicating more than it adds. |
Thanks @vipin-mohan!
|
Thanks @paul1868! Happy to hear it supports your use case.
|
Thanks @taozhongxiao!
|
Note that it's now |
|
One of the downsides of the spec is that all the UI pages are defined in advance as resources. I fail to see how this model will fit when the resources are dynamically generated. For example, imagine an AI agent that writes its own UI based on the user input. |
Resources can be dynamically generated, even if the uri is static. MCP fully supports that pattern unlike what some think or teach. Some SDKs even make it easy to do so. |
SEP-1865: MCP Apps - Interactive User Interfaces for MCP
Track: Extensions
Authors: Ido Salomon, Liad Yosef, Olivier Chafik, Jerome Swannack, Jonathan Hefner, Anton Pidkuiko, Nick Cooper, Bryan Ashley, Alexi Christakis
Status: Draft
Created: 2025-11-21
Please review the full SEP at modelcontextprotocol/ext-apps. This PR provides a summary of the proposal and wires it into the main spec.
Abstract
This SEP proposes an extension to MCP (per SEP-1724) that enables servers to deliver interactive user interfaces to hosts. MCP Apps introduces a standardized pattern for declaring UI resources via the ui:// URI scheme, associating them with tools through metadata, and facilitating bi-directional communication between the UI and the host using MCP's JSON-RPC base protocol. This extension addresses the growing community need for rich, interactive experiences in MCP-enabled applications, maintaining security, auditability, and alignment with MCP's core architecture. The initial specification focuses on HTML resources (
text/html;profile=mcp-app) with a clear path for future extensions.Motivation
MCP lacks a standardized way for servers to deliver rich, interactive user interfaces to hosts. This gap blocks many use cases that require visual presentation and interactivity that go beyond plain text or structured data. As more hosts adopt this capability, the risk of fragmentation and interoperability challenges grows.
MCP-UI has demonstrated the viability and value of MCP apps built on UI resources and serves as a community playground for the UI spec and SDK. Fueled by a dedicated community, it developed the bi-directional communication model and the HTML, external URL, and remote DOM content types. MCP-UI's adopters, including hosts and providers such as Postman, HuggingFace, Shopify, Goose, and ElevenLabs, have provided critical insights and contributions to the community.
OpenAI's Apps SDK, launched in November 2025, further validated the demand for rich UI experiences within conversational AI interfaces. The Apps SDK enables developers to build rich, interactive applications inside ChatGPT using MCP as its backbone.
The architecture of both the Apps SDK and MCP-UI has significantly informed the design of this specification.
However, without formal standardization:
This SEP addresses the current limitations through an optional, backwards-compatible extension that unifies the approaches pioneered by MCP-UI and the Apps SDK into a single, open standard.
Specification (high level)
The full specification can be found at modelcontextprotocol/ext-apps.
At a high level, MCP Apps extends the Model Context Protocol to enable servers to deliver interactive user interfaces to hosts. This extension introduces:
ui://URI schemeThis specification focuses on HTML content (
text/html;profile=mcp-app) as the initial content type, with extensibility for future formats.As an extension, MCP Apps is optional and must be explicitly negotiated between clients and servers through the extension capabilities mechanism (see Capability Negotiation section).
Rationale
Key design choices:
ui://), referenced by tools via metadata.Alternatives considered:
Alternatives considered:
Alternatives considered:
externalIframescapability.Backward Compatibility
The proposal is an optional extension to the core protocol. Existing implementations continue working without changes.
Reference Implementation
The MCP-UI client and server SDKs support the patterns proposed in this spec.
Olivier Chafik has developed a prototype in the
ext-appsrepository.Security Implications
Hosting interactive UI content from potentially untrusted MCP servers requires careful security consideration.
Based on the threat model, MCP Apps proposes the following mitigations:
You can review the threat model analysis and mitigations in the full spec.
Related
New Content Type for "UI" (#1146) by @kentcdodds
This is a long-awaited addition to the spec, the result of months of work by the MCP community and early adopters. We encourage you to: