Improvements for MCP-based agents #111
Replies: 16 comments 19 replies
-
|
I think these are all great topics
I think this should enable the bubbling of permission requests up from sub-agents to higher up agents/top level interaction
Something important here imo would be enabling top-level (or any intermediate layer) to have awareness of the topology of all nested agentic activity. |
Beta Was this translation helpful? Give feedback.
-
|
It would be useful to understand better how you see agents fitting into MCP at a conceptual level and in terms of user experience. Right now clients like Claude seem to play the role of a unified agent, since they execute the agent loop, and servers are simply capabilities/context being offered to this singular agent/client. Tools can do whatever they want, so can be agentic but this is irrelevant to the top level agent/client. What is meant by "agent support" in the roadmap? Is MCP working toward a vision where there is a singular client/agent that users interact and this agent is empowered by being connected to many MCPs? Or are you thinking about switching to something more akin to GPTs/Gems/Agents where users interact with many different top level agents that have a clear identity. The gaps mentioned in this issue seem to be mostly just about support multiple servers better and providing additional capabilities to tools (e.g. direct responses and elicitation) rather than supporting multiple agents explicitly. |
Beta Was this translation helpful? Give feedback.
-
|
Let MCP be simple and focus on standardizing tools and resources, etc., rather than defining standards for Agentic Workflows. The protocol should not overcomplicate matters by forcing diverse use cases to conform to a singular MCP way of organizing/orchestrating agents. |
Beta Was this translation helpful? Give feedback.
-
|
Hi! Here are few theses I'd like to share with you: For me, it seems like trees of agents concept is breaking the extisting client-server architetureFor trees of agents, it seems like all the nodes can both contain llm-calling logic (act as an MCP client) and provide resources/tools for them (act as an MCP server)
I do really miss the Server-->Client opportunities for my application,this caused me to look up over this issues and disscussions to see if people do run into same questions For example, let's take a Twitter MCP server. Imagine that this server can poll twitter to recieve new mentions from users, and then use MCP client (llm) to create responds for them Currently, this logic can be executed in 2 ways:
This approach is okay, but only if you are connecting to one server (Twitter server). If your agent should act on various platforms (Discords, youtube, tiktok, telegram...) you'll have to poll all those MCP servers as well, creating a lot of traffic and breaking the single responsibility principle inside the client
This approach is way better in sense of multiple MCP servers, but again, you'll have to subscribe to all of that resource and maintain some resource registry, which does not feel right to me. I would instead love an opportunity to push all updates directly to some message broker, and MCP client would consume that messages, idk |
Beta Was this translation helpful? Give feedback.
-
Proposal: Enhancing MCP to Support Provider-Independent CapabilitiesHello from Block, While considering improvements to the Agent UI for Goose, I realized that MCP might need enhancements to better support provider-independent capabilities. Let me explain in detail. Problem StatementImagine an agent that allows the use of any provider, such as Goose. Now, suppose we want to enable real-time audio communication. Since we donβt want to tie this feature to a specific provider, it makes sense to introduce it at the MCP server level. However, the UI should also be able to reflect this newly added capability, for example, by displaying a microphone button when audio communication is available. Proposed SolutionIf the MCP specification allowed servers or tools to declare the types of capabilities they provide (potentially as an enum of known categories), the Agent UI could dynamically adapt its controls based on available features. This could include elements such as:
Additionally, this would allow for more flexible settings management. If multiple MCPs provide the same capability, users could select which MCP instance should handle a given capability. This would enable the installation of MCPs with overlapping capabilities while ensuring that the preferred provider is chosen for each feature. Benefits
Would love to hear your thoughts on this approach! |
Beta Was this translation helpful? Give feedback.
-
|
Great discussion here. One of the UX patterns that I've been thinking about requires the LLM to initiate a long running task using a tool while it continues interaction with the user. Only to get an update when the long running task is complete so it can provide user the desired information or take further action based on the output. Think something like Deep Research where instead of the Agent being locked until the research is done it can continue the conversation. Can this be done with the current implementation (using the sampling method) or would this require changes in the protocol? This can be further expanded to allow the Agent to do multi-tasking, where it can invoke multiple tools and synthesize and refine its output as more information is provided to it. |
Beta Was this translation helpful? Give feedback.
-
|
In addition, we're working with multi-agent flows and the discussion regarding whether servers should be thought of as Tool Providers vs being fully Agentic. Instead of expecting servers to act as an Agent can we not have the Agent abstraction at the top level? So a session can have multiple "Agents" each with their own servers. And this information can be accessible to individual servers in case they want to invoke an available Agent (like sampling). I think it would be beneficial to have an Agent abstraction instead of nesting Agents within tools. In case a tool requires access to a specific Agent with certain capabilities, it can ask the client to enable/download the Agent and authenticate/authorize them to act. This would be cleaner from a privacy, transparency, security perspective since the user would have more visibility and control over how their data is being passed around behind the tool call. |
Beta Was this translation helpful? Give feedback.
-
|
In addition to logging, I'd advocate for OTel tracing with semantic conventions as an optional part of the spec. (I'm unsure if this belongs as part of the MCP-Agent discussion other than tracing being particularly useful to understand what's going on in the occluded box) |
Beta Was this translation helpful? Give feedback.
-
|
@jspahrsummers This is a really good list - I'd add one more item here, and would be curious about your thoughts. Specifically, for agentic flows, you can imagine a world where agents have the ability to discover, then call, mcp servers that have a per use cost, but WITHOUT a heavy user involved registration flow. So, a user asks for the weather, the agent discovers a weather server, it has some sort of budget for calls, it determines this is a good use of funds, and then calls the weather MCP server for a penny. In order to enable this flow, I think there needs to be some metadata / capabilities definition on the tool type itself. I don't think the properties are quite right here - if I'm being dilligent about architecture here, I want the weather call to just take a city, state, country, I don't want to co-mingle all the different permutations of how payments could work in the tool call - that is to say, imagine you have some mechanism for per-use call, but then some other mechanism for a subscription fee, it doesn't feel right to basically pollute the "method" invocation properties with optional properties around this. I think this would tend toward some sort of custom "extension" (which I think the MCP world we're calling a custom capabilities) for payment types (proof of pre-payment, usage based payment, etc). So, I would say, some sort of metadata extension here for tool calls themselves so in the list view you can get custom capabilities / extensions, and in the tool call you can supply custom required fields outside the primary method call. I do believe this is separate than a pure call property, but I could be wrong. That is to say, you could always have a property that is "proof of prepayment" but that's not an explicit "extension" or "capability" that's being defined and specific protocol being adhered to, which makes it hard to infer. So, I'd like to say, this tool requires adherence to pre-payment protocol XYZ, which means we're expecting some sort of proof of pre-payment identifier passed in in the tool metadata. Thoughts? EDIT: I'll add, this COULD be a server level capabilities where tool calls then expect a header with payment information, but I think that's a bit non-ideal because you might still have cost metadata on a call by call basis that you want to communicate. I've seen other proposals on here for cost metadata in the well-known file, but that also just feels weird, because based on the agent identity (from authorization) the cost might be different. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
One UX pattern that I've found extremely useful when writing interactive scripts is to create a temporary file (optionally populated with data), open the file in the user's preferred editor, wait for the user to close the editor, and then process the final data. Some common examples of this pattern are It would be great if MCP tools could also leverage this pattern. The server would send the contents of the temporary file to the client, and the client would manage the editor workflow and send the final data back to the server for processing. |
Beta Was this translation helpful? Give feedback.
-
|
Are you actively working on these issues currently? |
Beta Was this translation helpful? Give feedback.
-
|
Hi @amueller, thank you for re-initiating this thread. I am thinking couple of focussed threads might help immediate discussion before a PR can be raised. These are mostly distillation of multiple threads across MCP forum and other protocols:
|
Beta Was this translation helpful? Give feedback.
-
|
@jspahrsummers and others in thread - I put together a straw man PR for namespaces - it's meant to be VERY light touch, but would love thoughts. |
Beta Was this translation helpful? Give feedback.
-
|
Ultimately, this implementation should mesh with whatever the registry working group lands on for namespacing. Currently it looks like reverse domain is the favored approach because registered servers can be easily protected with domain ownership verification. So the namespace for tools on the official Github server would probably would be "com.github". |
Beta Was this translation helpful? Give feedback.
-
|
Hi All, raised couple of PRs related to discussion in this thread:
Thanks @cliffhall , for your suggestions earlier in the thread, have referenced them for implementation |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Starting a tracking discussion of various things that could improve MCP's suitability for agents and agentic workflows (especially trees of agents).
Recording this quickly, without enough explanationβhopefully I can backfill that later. π
Feel free to add other thoughts!
Scope
Beta Was this translation helpful? Give feedback.
All reactions