Improvements for MCP-based agents #111

jspahrsummers · 2024-12-13T11:53:57Z

jspahrsummers
Dec 13, 2024
Maintainer

jerome3o-anthropic · 2024-12-13T12:02:51Z

jerome3o-anthropic
Dec 13, 2024
Maintainer

I think these are all great topics

Structured, formatted intermediate updates from server -> client, so a deep agent graph can provide information to the user even while a top-level tool is still being run

I think this should enable the bubbling of permission requests up from sub-agents to higher up agents/top level interaction

Namespacing

Something important here imo would be enabling top-level (or any intermediate layer) to have awareness of the topology of all nested agentic activity.

1 reply

sepo-eng Feb 20, 2025

I think this should enable the bubbling of permission requests up from sub-agents to higher up agents/top level interaction

Permission requests in this context is about HITL interactions I suppose. In this case, we can consider enabling structured events as either notifications or optionally, requests so the top level interaction can respond.

Referencing this, a stateless interaction with an agent might require continuation after an indefinite pause (for example HITL). In this case status checks, and listing on agentic runs will be useful to allow for continuation.

tristanz · 2025-01-05T19:45:08Z

tristanz
Jan 5, 2025

It would be useful to understand better how you see agents fitting into MCP at a conceptual level and in terms of user experience. Right now clients like Claude seem to play the role of a unified agent, since they execute the agent loop, and servers are simply capabilities/context being offered to this singular agent/client. Tools can do whatever they want, so can be agentic but this is irrelevant to the top level agent/client.

What is meant by "agent support" in the roadmap? Is MCP working toward a vision where there is a singular client/agent that users interact and this agent is empowered by being connected to many MCPs? Or are you thinking about switching to something more akin to GPTs/Gems/Agents where users interact with many different top level agents that have a clear identity. The gaps mentioned in this issue seem to be mostly just about support multiple servers better and providing additional capabilities to tools (e.g. direct responses and elicitation) rather than supporting multiple agents explicitly.

6 replies

tristanz Jan 6, 2025

Do you want MCP servers to expose agents as first class objects to users that they can @mention, similar to prompts, or do you want agents to be largely invisible to users, similar to tools.

jspahrsummers Jan 8, 2025
Maintainer Author

It's an interesting question! Not one I've contemplated a whole bunch, to be honest. What are your thoughts?

tristanz Jan 9, 2025

I think the most flexible and open approach would be to expose them at the top level or recursively expose them up through linked MCPs. This mirrors human communication networks and seems the most open long-term vision. If agents are exposed to users, clients become a distribution channel for third-party agents, and users have more choice. The alternative is a more locked-down vision where the purpose of MCPs is only to enhance the functionality of the client, and users conceptually think of MCPs more like applications that their primary agent/client (Claude) can use. I think both approaches are valid, and the choice depends on strategic goals. But as a user and as a potential developer of third-party agents, my preference would be to allow MCP agents to be exposed directly to end users so that I can talk with all my agents from any client.

jspahrsummers Jan 9, 2025
Maintainer Author

I think, regardless, MCP (being a protocol) cannot force clients to expose users directly to such concepts. We kind of have to assume that clients will intermediate everything anyway. But some way of making information about sub-agents available at the "top level" seems useful.

Thanks for the thoughts!

Mehdi-Bl Feb 6, 2025

But MCP is mainly function calls here too.
And MCP calls are limited by model output, yes you can loop it. But the main issue I see, if you rely only on MCP to transfert fully the context, you will use more context output that is far more limited than input. Like providing the files to use or all the informations extracted from the database in previous round.
As MCP is backed by function call. And you want to loop agents, why not then have agents too using function call and tell them, read this file or fetch this record. This would be more efficient for output context.

Zane-XY · 2025-02-21T02:45:50Z

Zane-XY
Feb 21, 2025

Let MCP be simple and focus on standardizing tools and resources, etc., rather than defining standards for Agentic Workflows. The protocol should not overcomplicate matters by forcing diverse use cases to conform to a singular MCP way of organizing/orchestrating agents.

0 replies

Kalmy8 · 2025-02-27T08:13:15Z

Kalmy8
Feb 27, 2025

Hi! Here are few theses I'd like to share with you:

For me, it seems like trees of agents concept is breaking the extisting client-server architeture

For trees of agents, it seems like all the nodes can both contain llm-calling logic (act as an MCP client) and provide resources/tools for them (act as an MCP server)
It is an interesting concept, but for me it feels like a whole different thing. For complicated logic (like trees of agents) we could use complicated langgraphs or some multi-agent framework (a bunch of them are available around)

For me, MCP now is a very handy an needed instrument to decouple model's tools, prompts and datafeeds from the llm loop execution logic. Sure it is a very useful thing for production-grade services, enhancing flexibility, scalability, testability, etc.

I do really miss the Server-->Client opportunities for my application,

this caused me to look up over this issues and disscussions to see if people do run into same questions

For example, let's take a Twitter MCP server. Imagine that this server can poll twitter to recieve new mentions from users, and then use MCP client (llm) to create responds for them

Currently, this logic can be executed in 2 ways:

MCP client itself polls Twitter MCP server to fetch new mentions via Twitter API -> send them to client -> client sends a response back to MCP server -> MCP server publishes a response via API

This approach is okay, but only if you are connecting to one server (Twitter server). If your agent should act on various platforms (Discords, youtube, tiktok, telegram...) you'll have to poll all those MCP servers as well, creating a lot of traffic and breaking the single responsibility principle inside the client

Twitter MCP server could instead perform Twitter-polling itself, autonomously, fetch new events and store them as a resource. MCP client can subscribe and recieve notifications for that resource.

This approach is way better in sense of multiple MCP servers, but again, you'll have to subscribe to all of that resource and maintain some resource registry, which does not feel right to me. I would instead love an opportunity to push all updates directly to some message broker, and MCP client would consume that messages, idk

0 replies

Kvadratni · 2025-02-28T17:53:03Z

Kvadratni
Feb 28, 2025

Proposal: Enhancing MCP to Support Provider-Independent Capabilities

Hello from Block,

While considering improvements to the Agent UI for Goose, I realized that MCP might need enhancements to better support provider-independent capabilities. Let me explain in detail.

Problem Statement

Imagine an agent that allows the use of any provider, such as Goose. Now, suppose we want to enable real-time audio communication. Since we don’t want to tie this feature to a specific provider, it makes sense to introduce it at the MCP server level. However, the UI should also be able to reflect this newly added capability, for example, by displaying a microphone button when audio communication is available.

Proposed Solution

If the MCP specification allowed servers or tools to declare the types of capabilities they provide (potentially as an enum of known categories), the Agent UI could dynamically adapt its controls based on available features. This could include elements such as:

A microphone button for real-time audio communication
A webcam toggle for video streaming
Other UI elements for reactive capabilities

Additionally, this would allow for more flexible settings management. If multiple MCPs provide the same capability, users could select which MCP instance should handle a given capability. This would enable the installation of MCPs with overlapping capabilities while ensuring that the preferred provider is chosen for each feature.

Benefits

Improved UI adaptability – The Agent UI can dynamically adjust to available capabilities.
Provider independence – Features can be added without binding to a specific provider.
More flexible settings – Users can configure MCP instances to handle overlapping capabilities.

Would love to hear your thoughts on this approach!

2 replies

Mehdi-Bl Feb 28, 2025

This is more about meta data. And is not providing any change for the underlaying Function calling that we should not forget MCP wrap.
This will be more extra meta data that only the UI use.
Similar like MCP could expose auto discovery of required parameters/defaults.

Kvadratni Feb 28, 2025

absolutely. I just think this should be a bit more formalized than just raw metadata?Is there a better topic for this suggestion?

hasani114 · 2025-03-04T21:07:54Z

hasani114
Mar 4, 2025

Great discussion here. One of the UX patterns that I've been thinking about requires the LLM to initiate a long running task using a tool while it continues interaction with the user. Only to get an update when the long running task is complete so it can provide user the desired information or take further action based on the output. Think something like Deep Research where instead of the Agent being locked until the research is done it can continue the conversation. Can this be done with the current implementation (using the sampling method) or would this require changes in the protocol?

This can be further expanded to allow the Agent to do multi-tasking, where it can invoke multiple tools and synthesize and refine its output as more information is provided to it.

3 replies

Mehdi-Bl Mar 4, 2025

This is not how LLM work and function calling work
You need to understand that MCP wraps function calling. Despite we have JSON RPC. The AI is doing things in serial not parallel.
So when the model understand it need to call a tool, it generates a structured output in the expected format and the wrapper then transfert that and call the tool, get the output and then get back.
I have similar needs and wanted to run similar loads. But then I'm working on background runner that would send the task in the queue and when finished inform me another way (not thru the LLM).

hasani114 Mar 4, 2025

I understand the sequential way in which current LLM function calls are usually implemented. But since we're talking about Agentic systems I'm assuming progress in getting them to work on longer running tasks would require changes in implementation. If a human is given a task that would require them to execute n number of subtasks doesn't mean they are no longer available to talk. While I agree the current Agent loop is sufficient when current LLM capabilities are taken into context they won't make sense for more autonomous longer running agents.

You can already see some of the UX being implemented in double texting (which is available in Langgraph) which allows a user to send multiple texts before the initial loop is complete. Essentially, the first run is "interrupted" and the execution is rolled back and restarted, but it would be better to allow the Agent to have two async trajectories. Just because an Agent is waiting for a tool call doesn't mean it cannot perform other actions or interact with the user. The UX seems more natural and intuitive to me, and I'm wondering if the protocol allows for something like this to be done OOB.

Mehdi-Bl Mar 5, 2025

I'm afraid you need an orchestrator here and this looks over kill for MCP.
May be your scope is swarms of agents. But as an MCP user/builder, I find we should fill the gaps to get 1:1 working fine.
Than run into the complexity for clusters of agents. This is not simple things what you want to do and more edge cases.

hasani114 · 2025-03-04T21:31:34Z

hasani114
Mar 4, 2025

In addition, we're working with multi-agent flows and the discussion regarding whether servers should be thought of as Tool Providers vs being fully Agentic. Instead of expecting servers to act as an Agent can we not have the Agent abstraction at the top level? So a session can have multiple "Agents" each with their own servers. And this information can be accessible to individual servers in case they want to invoke an available Agent (like sampling). I think it would be beneficial to have an Agent abstraction instead of nesting Agents within tools.

In case a tool requires access to a specific Agent with certain capabilities, it can ask the client to enable/download the Agent and authenticate/authorize them to act. This would be cleaner from a privacy, transparency, security perspective since the user would have more visibility and control over how their data is being passed around behind the tool call.

2 replies

Mehdi-Bl Mar 4, 2025

What you want here is more meta data and full scale tools that work outside of function calling.
We have tools/promptes/resources, then may be create another type? But this can't be in tools.

hasani114 Mar 4, 2025

Yes, that is my thinking as well.

ahgraber · 2025-03-14T19:21:24Z

ahgraber
Mar 14, 2025

In addition to logging, I'd advocate for OTel tracing with semantic conventions as an optional part of the spec. (I'm unsure if this belongs as part of the MCP-Agent discussion other than tracing being particularly useful to understand what's going on in the occluded box)

2 replies

Mehdi-Bl Mar 14, 2025

Are we still talking here about an SDK for doing tools?
What prevents you for adding any logging you want?
I use logging without any issue and tracing in my MCP servers, don't need the SDK to ship that.

ahgraber Mar 14, 2025

I guess my point is - I think tracing is an important enough consideration when building AI applications that I would like to see
(a) tracing is explicitly mentioned as part of the spec
(b) methods for Otel compliant tracing is provided to make it easy for devs to manage traces and spans
We see the recognition of the importance of tracing elsewhere - Pydantic AI -> Logfire, MLFlow, LangGraph -> LangSmith (though that one isn't really OTel as far as I understand)

patwhite · 2025-03-29T18:04:16Z

patwhite
Mar 29, 2025

@jspahrsummers This is a really good list - I'd add one more item here, and would be curious about your thoughts. Specifically, for agentic flows, you can imagine a world where agents have the ability to discover, then call, mcp servers that have a per use cost, but WITHOUT a heavy user involved registration flow. So, a user asks for the weather, the agent discovers a weather server, it has some sort of budget for calls, it determines this is a good use of funds, and then calls the weather MCP server for a penny.

In order to enable this flow, I think there needs to be some metadata / capabilities definition on the tool type itself. I don't think the properties are quite right here - if I'm being dilligent about architecture here, I want the weather call to just take a city, state, country, I don't want to co-mingle all the different permutations of how payments could work in the tool call - that is to say, imagine you have some mechanism for per-use call, but then some other mechanism for a subscription fee, it doesn't feel right to basically pollute the "method" invocation properties with optional properties around this.

I think this would tend toward some sort of custom "extension" (which I think the MCP world we're calling a custom capabilities) for payment types (proof of pre-payment, usage based payment, etc). So, I would say, some sort of metadata extension here for tool calls themselves so in the list view you can get custom capabilities / extensions, and in the tool call you can supply custom required fields outside the primary method call.

I do believe this is separate than a pure call property, but I could be wrong. That is to say, you could always have a property that is "proof of prepayment" but that's not an explicit "extension" or "capability" that's being defined and specific protocol being adhered to, which makes it hard to infer. So, I'd like to say, this tool requires adherence to pre-payment protocol XYZ, which means we're expecting some sort of proof of pre-payment identifier passed in in the tool metadata.

Thoughts?

EDIT:

I'll add, this COULD be a server level capabilities where tool calls then expect a header with payment information, but I think that's a bit non-ideal because you might still have cost metadata on a call by call basis that you want to communicate. I've seen other proposals on here for cost metadata in the well-known file, but that also just feels weird, because based on the agent identity (from authorization) the cost might be different.

0 replies

cliffhall · 2025-03-31T19:56:18Z

cliffhall
Mar 31, 2025
Collaborator

Structured, formatted intermediate updates from server -> client, so a deep agent graph can provide information to the user even while a top-level tool is still being run

Could this be an expansion of the progress notification?

Currently...

Expanded

params could include an object with partial data.

{
  "jsonrpc": "2.0",
  "method": "notifications/progress",
  "params": {
    "progressToken": "abc123",
    "progress": 50,
    "total": 100,
    "chunk": { "type": "string", "value": "xyz" }
  }
}

Admittedly, how the structure is handled is hand-wavy here, just an illustration of the idea. Maybe the actual implementation would really require a separate notification type, but I just thought we might be able to keep the number of notification schemas down if we piggybacked on the progress note.

0 replies

jonathanhefner · 2025-04-06T18:55:02Z

jonathanhefner
Apr 6, 2025
Collaborator

Elicitation (additional requests for information from the user) from server -> client

One UX pattern that I've found extremely useful when writing interactive scripts is to create a temporary file (optionally populated with data), open the file in the user's preferred editor, wait for the user to close the editor, and then process the final data. Some common examples of this pattern are git commit and git rebase --interactive, but the pattern can apply to any kind of editable data, not just text.

It would be great if MCP tools could also leverage this pattern. The server would send the contents of the temporary file to the client, and the client would manage the editor workflow and send the final data back to the server for processing.

1 reply

cliffhall Apr 7, 2025
Collaborator

The server would send the contents of the temporary file to the client, and the client would manage the editor workflow and send the final data back to the server for processing.

I was thinking about this the other day - human sampling. We send something to the client to 'sample' the LLM, why not the human?

A simpler message / response type pair intended only for the user to modify or respond to could achieve the same. Or perhaps sampling requests could have a target - human or LLM. So model-related fields would be only be required if the target was LLM.

amueller · 2025-04-09T18:19:39Z

amueller
Apr 9, 2025

Are you actively working on these issues currently?
A2A that was just announced tries to address some of these explicitly, and I am wondering whether it's still a goal to address them within MCP.

0 replies

siwachabhi · 2025-04-11T05:09:55Z

siwachabhi
Apr 11, 2025

Hi @amueller, thank you for re-initiating this thread.

I am thinking couple of focussed threads might help immediate discussion before a PR can be raised. These are mostly distillation of multiple threads across MCP forum and other protocols:

Multi-turn interactions with tools: [Proposal] Task semantics and multi-turn interactions with tools #314
Suggested response format: [Proposal] Suggested Response Format #315 (a more immediate improvement is covered in: Bring back the concept of "toolResult" (non-chat result) #97)

2 replies

amueller Apr 14, 2025

I think it would be great to move forward on those, but I think we probably need to hear from @jspahrsummers about what direction they want to go?

siwachabhi Apr 14, 2025

+1, it will be great to get inputs from @jspahrsummers

patwhite · 2025-04-14T15:49:57Z

patwhite
Apr 14, 2025

@jspahrsummers and others in thread - I put together a straw man PR for namespaces - it's meant to be VERY light touch, but would love thoughts.

#334

0 replies

cliffhall · 2025-04-14T23:17:46Z

cliffhall
Apr 14, 2025
Collaborator

Ultimately, this implementation should mesh with whatever the registry working group lands on for namespacing. Currently it looks like reverse domain is the favored approach because registered servers can be easily protected with domain ownership verification. So the namespace for tools on the official Github server would probably would be "com.github".

0 replies

siwachabhi · 2025-04-22T07:07:40Z

siwachabhi
Apr 22, 2025

Hi All,

raised couple of PRs related to discussion in this thread:

Thanks @cliffhall , for your suggestions earlier in the thread, have referenced them for implementation

0 replies

Improvements for MCP-based agents #111

Uh oh!

Uh oh!

jspahrsummers Dec 13, 2024 Maintainer

Scope

Replies: 16 comments · 19 replies

Uh oh!

Uh oh!

jerome3o-anthropic Dec 13, 2024 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jspahrsummers Jan 8, 2025 Maintainer Author

Uh oh!

Uh oh!

jspahrsummers Jan 9, 2025 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

For me, it seems like trees of agents concept is breaking the extisting client-server architeture

I do really miss the Server-->Client opportunities for my application,

Uh oh!

Proposal: Enhancing MCP to Support Provider-Independent Capabilities

Problem Statement

Proposed Solution

Benefits

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jspahrsummers
Dec 13, 2024
Maintainer

Replies: 16 comments 19 replies

jerome3o-anthropic
Dec 13, 2024
Maintainer

jspahrsummers Jan 8, 2025
Maintainer Author

jspahrsummers Jan 9, 2025
Maintainer Author