KEMBAR78
Supporting Teams as Participants in a GroupChat by ekzhu · Pull Request #5863 · microsoft/autogen · GitHub
Skip to content

Conversation

@ekzhu
Copy link
Collaborator

@ekzhu ekzhu commented Mar 7, 2025

Nested teams pattern for group chat. This is a draft design for getting ideas. I have found this to be very useful.

Generally, the idea is to forward all ChatMessage from the parent team to the nested team and forward all ChatMessage from the nested team to the parent team.

Difference between this and SocietyOfMindAgent: it doesn't perform any additional model client call or encapsulation of internal chat messages. The inner team is transparent to the parent team and vice versa. All agents share the same context as they all have access to the same set of ChatMessage. However, this pattern allows for flexibility in setting the order of speakers, as now you can have "inner loops".

See example:

# pip install -U autogen-agentchat autogen-ext[openai,web-surfer]
# playwright install
import asyncio

from autogen_agentchat.agents import UserProxyAgent
from autogen_agentchat.conditions import TextMentionTermination, TextMessageTermination
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.ui import Console
from autogen_ext.agents.web_surfer import MultimodalWebSurfer
from autogen_ext.models.openai import OpenAIChatCompletionClient


async def main() -> None:
    model_client = OpenAIChatCompletionClient(model="gpt-4o")
    # The web surfer will open a Chromium browser window to perform web browsing tasks.
    web_surfer = MultimodalWebSurfer("web_surfer", model_client, headless=False, animate_actions=True)
    # NOTE: you can skip input by pressing Enter.
    user_proxy = UserProxyAgent("user_proxy")
    # The outter-loop termination condition that will terminate the team when the user types "exit".
    user_termination = TextMentionTermination("exit", sources=["user_proxy"])
    # The inner-loop termination condition that stops the web surfer when it has finished its actions.
    web_surfer_termination = TextMessageTermination(source="web_surfer")
    # Web surfer and user proxy takes turns in a round-robin fashion.
    team = RoundRobinGroupChat(
        [
            # For each turn, the web surfer performs multiple actions in a nested group chat.
            RoundRobinGroupChat([web_surfer], termination_condition=web_surfer_termination),
            # The user proxy gets user input once the web surfer has finished its actions.
            user_proxy,
        ],
        termination_condition=user_termination,
    )
    try:
        # Start the team and wait for it to terminate.
        await Console(team.run_stream(task="Find information about AutoGen and write a short summary."))
    finally:
        await web_surfer.close()


asyncio.run(main())

@codecov
Copy link

codecov bot commented Mar 7, 2025

Codecov Report

❌ Patch coverage is 89.39394% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.76%. Comparing base (2618496) to head (7be2399).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...entchat/teams/_group_chat/_chat_agent_container.py 87.80% 5 Missing ⚠️
...ogen-agentchat/src/autogen_agentchat/base/_team.py 75.00% 2 Missing ⚠️
...en_agentchat/teams/_group_chat/_base_group_chat.py 87.50% 2 Missing ⚠️
...p_chat/_magentic_one/_magentic_one_orchestrator.py 71.42% 2 Missing ⚠️
...at/teams/_group_chat/_graph/_digraph_group_chat.py 85.71% 1 Missing ⚠️
...oup_chat/_magentic_one/_magentic_one_group_chat.py 85.71% 1 Missing ⚠️
...gentchat/teams/_group_chat/_selector_group_chat.py 91.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5863      +/-   ##
==========================================
+ Coverage   80.74%   80.76%   +0.02%     
==========================================
  Files         234      234              
  Lines       17892    17985      +93     
==========================================
+ Hits        14447    14526      +79     
- Misses       3445     3459      +14     
Flag Coverage Δ
unittests 80.76% <89.39%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@EItanya
Copy link
Contributor

EItanya commented Mar 7, 2025

Just to play devil's advocate on the implementation here. This turns the entire BaseGroupChat into a if statement WRT Team vs. Agent, whereas if you use a thin wrapper agent which wraps a team you can maintain the same 1:1 nature of the GroupChat. I understand the SocietyOfMindAgent has other behavior which may not work in this case, but there could be an even simpler wrapper which just forwards the result along. In composable, declarative systems it can sometimes be better to keep the building blocks simpler and allow for complex interactions, rather than creating complex building blocks.

Again, the above is a straw man and I think either way is great, just thinking out loud.

@ekzhu
Copy link
Collaborator Author

ekzhu commented Mar 7, 2025

@EItanya For unifying both Team and ChatAgent, we have a TaskRunner protocol that both adhere to. I think it is possible to unify them and use the same code path to handle both.

One limiting factor of the custom agent wrapper approach is that the ChatMessage from the inner team are not bubbled up to other agents -- only the final ChatMessage produced by the wrapper agent is. So it isn't exactly the same as nested team. When using nested teams, all participants share the same context consist of ChatMessage.

The benefit I see for the new nested team feature is simplicity. It reduces the lines of code and speedups prototyping.

Regarding a declarative workflow APIs, I believe they can eventually become the low level API that we use to implement these high-level AgentChat patterns. Especially after #5787

@EItanya
Copy link
Contributor

EItanya commented Mar 8, 2025

Got it, I think I understand and agree. The behavior I want is more akin to the current wherein I run a society of mind like agent to keep the internal team encapsulated and just return a final result. Having both options will absolutely unlock more powerful options!

@bhakimiy
Copy link
Contributor

Hi @ekzhu.

When you have a moment, could you please provide an update on the current status of this pull request and, if possible, share an estimated timeframe for its merge?

Currently I'm using the wrapper agent to nest the teams, and this feature would greatly improve the experience of working with nested teams in autogen.

@ekzhu ekzhu force-pushed the ekzhu-nested-team branch from a5bb052 to c6dc1a8 Compare July 8, 2025 12:53
@ekzhu ekzhu marked this pull request as ready for review July 8, 2025 12:54
@EItanya
Copy link
Contributor

EItanya commented Jul 8, 2025

Out of curiosity, wouldn't agents/teams as tools work for this use-case?

@ekzhu ekzhu changed the title Nested teams in group chats Supporting Teams as Participants in a GroupChat Jul 8, 2025
@ekzhu
Copy link
Collaborator Author

ekzhu commented Jul 9, 2025

Out of curiosity, wouldn't agents/teams as tools work for this use-case?

Would be different, as the agent/team tools does not have access to the full context of the caller, and is driven by the caller agent's model. In the case of nested teams, it is about adding to the topology of group chats. While both are nested and hierarchical, they are quite different in how they are triggered and how context is shared.

@victordibia
Copy link
Collaborator

Overall, I think this works well for GroupChat usecases (where all message transparency is some type of implicit requirement - i.e, the state of the task is the conversation and for this to work all agents should see the conversation).

it doesn't perform any additional inference or encapsulation.

It might be worth adding a line in the description to clarify what we mean by inference.

Overall, looks good.
One final thing, is that might this be a breaking change? It might be useful for us to outline any situations where this might be the case

  • Any user code or plugin that subclasses Team, instantiates BaseGroupChat, or consumes/produces group chat events will likely break and need updates to be compatible with these changes. The good news is that there are likely few situations where this is done so there impact is likely minimal.
    Either way we'd need to rev version numbers in next release.

@ekzhu ekzhu enabled auto-merge (squash) July 28, 2025 00:47
@ekzhu ekzhu disabled auto-merge July 28, 2025 00:58
@ekzhu
Copy link
Collaborator Author

ekzhu commented Jul 28, 2025

It might be worth adding a line in the description to clarify what we mean by inference.

Updated

One final thing, is that might this be a breaking change?

It will break type hints of subclasses of the internal classes ChatAgentContainer and BaseGroupChat.

Will move to version 0.7.0 for the next release.

@ekzhu ekzhu merged commit 98e6bba into main Jul 28, 2025
65 checks passed
@ekzhu ekzhu deleted the ekzhu-nested-team branch July 28, 2025 05:59
@vvvvvvizard
Copy link

vvvvvvizard commented Sep 19, 2025

  <agentID>AgentAlpha</agentID>
  <goals>
    <goal id="G1">Complete Task A</goal>
    <goal id="G2">Learn about Topic B</goal>
  </goals>
  <beliefs>
    <belief type="fact">Task A requires Skill C</belief>
    <belief type="preference">Prefers collaborative tasks</belief>
  </beliefs>
  <experiences>
    <experience id="E1" date="2025-09-18">Successfully completed Subtask X</experience>
    <experience id="E2" date="2025-09-17">Encountered error in Subtask Y</experience>
  </experiences>
</agentMemory> 
``

Copy paste gemini 
XML can be utilized as a memory format for agents, particularly in systems where structured, self-describing data is beneficial for agent communication, state management, or knowledge representation.

How XML Functions as Agent Memory:

Structured Data Storage: 

XML's hierarchical structure allows for the organization of complex data, enabling agents to store and retrieve information in a well-defined format. This is particularly useful for representing diverse types of agent knowledge, such as beliefs, goals, past experiences, or environmental observations.

Self-Describing Nature: 

XML tags provide semantic meaning to the data, making the memory content more understandable and interpretable by the agent itself or by other agents in a multi-agent system. This facilitates interoperability and knowledge sharing.

Agent State and Context: 

An agent's current state, including its internal variables, parameters, and ongoing tasks, can be serialized into XML format and stored as part of its memory. This allows for persistent storage of agent state, enabling agents to resume operations after interruptions or migrate between different platforms.

Communication and Interoperability: 

In multi-agent systems, XML can be used as a standardized format for agent communication, where messages or knowledge transfers are encoded in XML. This allows agents developed on different platforms or using different programming languages to exchange information effectively, with the XML acting as a common language for memory representation during communication.

Knowledge Representation: 

XML can represent various forms of knowledge, from simple key-value pairs to more complex ontologies or rule sets, particularly when combined with technologies like RDF (Resource Description Framework) for semantic enrichment.

Considerations when using XML for Agent Memory:

Memory Overhead: 

Parsing and manipulating large XML documents can be memory-intensive, especially with DOM (Document Object Model) parsers that load the entire document into memory. For very large datasets, streaming XML parsers (like SAX) or alternative memory management techniques may be more efficient.

Performance: 

The performance of XML-based memory operations can depend on the complexity of the XML structure and the efficiency of the XML processing libraries used.

Schema Definition: 

Using XML schemas (XSD) or DTDs (Document Type Definitions) can help enforce the structure and validity of the agent's memory, ensuring data integrity.




Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants