-
Notifications
You must be signed in to change notification settings - Fork 481
feat: RemoteAgents #2055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: RemoteAgents #2055
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
Thanks @Lancetnik! I like the progressive approach. I have a few questions:
|
|
Thanks Nikita! This reads like a very neat proposal as a start! A couple questions: How the |
Well, the current implementation is designed around the idea of staying with all |
|
Claude finished @Lancetnik's task —— View job PR Review: RemoteAgents Implementation (PR #2055)Executive SummaryThis is an excellent and comprehensive PR that introduces distributed agent communication to AG2 through both HTTP and A2A protocol support. The implementation demonstrates strong architectural design, extensive test coverage, and thorough documentation. With 4,457 additions across 64 files, this represents a significant feature addition that enables true distributed multi-agent systems. Recommendation: Approve with Minor Suggestions ✅ The code is production-ready, but I've identified some areas for improvement in error handling, resource management, and edge case coverage. 🎯 Strengths1. Excellent Architecture
2. Comprehensive Implementation
3. Strong Test Coverage
4. Quality Documentation
🔍 Issues & RecommendationsCritical Issues1. Infinite Polling Loop Risk -
|
|
@marklysze seems like everything fine now. I am just waiting for your review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work @Lancetnik!
AG2 Remote Agents
Introduction
The current AG2 multi-agent model is based on the assumption that all agents run in the same process.
Basic multi-agent communication looks like this:
So, "agents" in our case are just Python objects with methods. This works well enough for simple cases, but it does not scale well. Therefore, the main problems of the current model are:
Remote Agents' motivation
The suggested Remote Agents concept is pretty close to microservices. Here are some features we can get:
Introducing the Remote Agent concept would allow us to split agent development among different teams. This would increase the quality of individual agents, the number of features, and make multi-agent systems less coupled. Like microservices, remote agents allow scaling AG2 multi-agent system to an enterprise level.
What we would like to achieve (Goals)
An inter-agent communication model that
Remote Agents API
RemoteAgent
First of all, I suggest introducing the
RemoteAgentclass that would be compatible with the regularConversableAgent. This way, we can migrate existing AG2 agents to theRemote Agentwithout any changes. For example, the above application could look like this:This way,
reviewerandteacherswere moved to separate processes. They are able to:AgentBus runtime
Then, we should make our
ConversableAgentable to process remote calls. So, we need to introduce theAgentBus, which would be responsible for:Suggested API:
The question is about which methods should be implemented by
runtime.Conversation State
Currently,
ConversableAgentknows about conversation state - it has a message history and knows about each agent's messages and actions. Distributed conversation state should also be available for each participant (remote or local). Here we have two options:I think we should start with the first option. Local context synchronization is not a problem, as each agent sends a message to all chat participants so they can update their local states in real-time. The question is about message order guarantees, but most of the time there is a single speaker, and in other cases, agents can talk concurrently without any order guarantee.
These reasons are enough to choose the first option. A real distributed conversation state is a more complex solution that requires a lot of refactoring, so we will provide it as another interface later. Finally, users will be able to choose between using a local copy or a distributed implementation.
Sync or async communication
By design, multi-agent communication is a sequence of messages with no guarantee of order. One agent can send a message to another, the second can send a reply to the third, and the third can say something to all of them again, then another random agent can send another message, and so on.
Therefore,
RemoteAgent().send_messagedoesn't require a response, so our distributed runtime must be asynchronous by design.Asynchronous HTTP Communication
Asynchronous communication can be implemented using HTTP requests. To do this, we just need to use
RemoteAgent().send_messageas an HTTP request with an emptyResponse <200>response. To send a real response, we need to use another HTTP request as a webhook for another agent. However, the problem is that we cannot update the local state of other chat participants this way. Additionally,RemoteAgentneeds to know about all chat participants and their addresses in order to send a message to a random agent instead of simply responding to the original request. Therefore, our solution could be:chat_startto all remote agents. This event includes:messageHTTP request to all participants in order to invalidate their local state.Pros:
Cons:
In the future, when we have a truly distributed conversation state, our implementation will become a message broker with a self-written queue as the state. For this reason, I suggest using a message broker such as RabbitMQ, Kafka, Nats, or Redis.
Asynchronous Broker-based Communication
We can use message brokers like RabbitMQ, Kafka, Nats, or Redis to implement asynchronous communication. This way,
RemoteAgent().send_message()just publishes a message to the broker in the conversation context. The agent's response will be the similar message that was published by another agent. To do this, we need no to know all the participants' addresses. We can simply send a message to achat_idtopic, and all agents will listen for it. However, we still need to notify all participants of the chat start so they can subscribe to the topic. So, the implementation might look like this:chat_startevent to all remote agents. This event includes:messageon thechat_idtopic to share it with all participantsI suggest using NATS at the start. It is simple, lightweight, and easy to integrate with other systems. Benefits of using it include:
[chat_id].*. Listen for all messages in a chat.*.[agent_id]. Listen to direct messages from a specific agent.[chat_id].[agent_id]. Listen only for messages from the specified agent in the chat.As a client, I suggest using FastStream. We have a lot of experience with this framework, it's simple and lightweight, and easy to integrate with other systems. It has native observability features like OTEL, metrics and healthchecks, and it supports multiple message brokers so we can switch if needed (or implement alternative runtimes users can choose).
Stateful or stateless
By design, conversations are stateful. However, a specific agent can choose to be stateless. REST-API-like agents can be stateless, as they don't need to be aware of the conversation state in order to respond to a request. They simply consume a single request and provide a response to the conversation. Stateful agents, on the other hand, need to know the entire history of the conversation in order to make a decision.
I suggest allowing users to choose which agents should be stateful and which should be stateless.
The suggested API:
AgentWrapperimplementation details:Stateful agent emulate a local conversation with incoming message requests. For this reason, it needs to have a copy of each chat participant. The
chat_startevent must contain information about all participants, so that they can create a local copy of the state of the conversation. Stateless remote agents are more difficult to implement, so it is recommended to start with stateless ones.Migration from stateless defaults to stateful can be done by adding AgentWrapper and making stateless default deprecated. Then we can change the default behavior to stateful
Conversation Distributed Context
https://docs.ag2.ai/latest/docs/user-guide/advanced-concepts/orchestration/group-chat/context-variables/
AG2 supports context variables that can be passed to conversations. Such variables should definitely be passed to remote agents. I suggest passing them as part of the
start_chatevent. The final information about the chat would look like this:{ "chat_id": "123", "participants": ["agent1", "agent2", "agent3"], "context": { "variable1": "value1", "variable2": "value2" } }This message should be sent to all participants, and they should update their local contexts accordingly. Also, any updates to context variables (if it's a valid case) should be sent out to all participants too. For this reason, we need to add an
update_contextevent to the protocol, which should be listened for by all participants in the chat.{ "context": { "variable1": "value1", "variable2": "value2" } }Conversation Manager
Conversation manager is a special agent responsible for managing conversations. It sends a specific event to determine the next speaker. We should add this new event to the protocol.
choose_next_speakerevent:{ "next_speaker": "agent1" }The conversation manager should function at the chat initiator's side. It listens to all incoming messages and sends the
choose_next speakerevent to all participants. The appropriate agent processes this event and sends a message to the conversation. Other participants ignore this event.Such synchronous communication is required because RemoteAgent is a REST-like service that knows nothing about conversations except their state. Therefore, conversation management rules must be defined by the initiator at conversation startup. So, the conversation manager must be part of the chat initiators.
In the future,
start_chatwill have information about conversation management rules. This will allow agents to choose the next speaker themselves, but the first implementation could be synchronous.Tool calling
MCP Tools and functions can be declared on either side of a remote conversation.
Chat initiator side:
Remote agent side:
I think the agent should only be able to use their own tools. So, we don't need to add a new event to the protocol.
Human in the Loop
Human on the edge
Multi-agent conversation should be able to handle a human in the loop. I think this logic is closely related to the conversation manager and the chat initiator. So, I suggest adding a new event to the protocol called
ask_human_input, which the agent would send to the chat, and wait for a human response from the conversation initiator's side.Human in the middle
In some cases, agents may require human input for their actions, not just by conversation design. For example, a client initiates a distributed conversation, and a remote agent needs administrator approval for an action. The user should not be able to approve or reject the action, so this is not a case of an
ask_humans_inputevent.So, we need a mechanism to handle these cases. I suggest adding an additional hook to
AgentBus, like:This hook should be able to process user input and return a result to the agent. Also, this method allows you to call the
ask_human_inputevent if necessary. Hooks allow you to implement different types of user input, such as messenger, email, and SMS, etc. You just need to call some code and wait for the user's response and return it as the function's result.RemoteAgent inaccessibility
Conversation correctness strongly depends on each agent's accessibility. So, I suggest adding a specific event to the protocol called
ping. The conversation manager should send thepingevent to all remote participants with a timeout to check their availability. If any agent becomes unavailable, the conversation manager should notify the other participants by sending amark_deadevent. If the agent becomes available again for any participant, they should be notified other chat members using themark_aliveevent.Additionally, we could add special markers to the
RemoteAgentAPI to allow users to take action when an agent goes offline:If a dead agent is ignored, the conversation manager should respect this information in the
choose_next_speakerdecision.Also, we should process Agent new version release while Conversation on going correctly.
Observability
All remote agents should support all observability features, including:
Most tools are already implemented at runtime (HTTP or FastStream), so we can reuse them.
Messages serialization
Some remote agents may require a specific incoming message format or use a strict message structure for their responses. We can respect these requirements at the protocol level and automatically retry incorrectly formatted remote requests. To notify chat participants about these requirements, I suggest adding a new event to the protocol called
i_am. This event should be sent to the chat immediately after thestart_chatevent by each participant, providing all participants with information about an agent's requirements and features so they can adjust their behavior accordingly.Authentication
Authentication is a very important requirement for distributed systems. Remote agents should support authentication at runtime. We can adopt Basic / Digest authentication for HTTP and integrate it with Keycloak or implement tokens ourselves. In broker cases, we can delegate authentication to the message broker. This part doesn't differ from regular microservice authentication.
Interoperability with non-AG2 agents
We should connect non-AG2 agents as long as they implement AG2 RemoteAgent interface. The AG2 framework should provide suitable interfaces and pre-written functions to allow users to write their own protocol implementations with non-ag2 agents. This feature is not needed at the start, so it has Priority 1.
Implementation plan
P0: First implementation
The simplest implementation could be a regular HTTP runtime with stateless remote agents that can only respond to questions. They do not have the ability to maintain a conversation state.
Therefore, the conversation would look like just an HTTP request for a response from a stateless agent, instead of the local calls that we have now. The chat manager at the initiator's side makes decisions about who speaks next, then calls the next agent, and so on.
This is a simple but effective implementation. It also allows you to check whether RemoteAgents are available and implement these mechanisms.
HTTPAgentBusP1: Conversation State support
In this phase, we should make RemoteAgents stateful.
start_chatevent to all remote participants.Also, at this stage, I suggest implementing human-in-the-loop and context variable support.
Therefore, all Remote Agents become full-featured with conversation state support. At this stage, we can implement an alternative AgentBus implementation based on a message broker to avoid broadcasting messages from the Conversation Manager to all participants.ext variables support.
P2: Remote Agents features
Here we should support distributed features like:
It makes our distributed systems safe and reliable.
P3: Real Distribution
Here we should go beyond local copies of the real state and make it truly distributed.
So, any remote agent should be able to decide on the next speaker on its own. This allows us to make the protocol more flexible and reliable, and we don't need a chat initiator anymore.
Also, we should make the conversation state truly distributed, which avoids any inconsistency between participants and makes communication much more reliable.
P4: additional features
Current codebase problems
ConversableAgent.senddoesn't have an information about current chat identifier. So we can't bind a published message to specific conversation.ConversableAgentstrongly bound to specific conversation and couldn't be reused in different conversations.Protocol methods
start_chat(to all participants){ "chat_id": "123", "participants": ["agent1", "agent2", "agent3"], "context": { "variable1": "value1", "variable2": "value2" } }stop_chat(to chat)update_context(to chat){ "context": { "variable1": "value1", "variable2": "value2" } }ask_human_input(to chat)ping(to specific agent)mark_dead(to chat){ "agent_id": "agent1" }mark_alive(to chat){ "agent_id": "agent1" }send_message(to chat)choose_next_speaker(to chat)i_am(to chat or as answer)