Multi-Agent System Orchestration
How to string agents together with PydanticAI and OpenAI Agents SDK
Multi-agent systems are going to be big (if they aren’t already, that is — and if they are, they will get even bigger). All major AI operators have implemented multi-agent frameworks that work with various LLM models.
CrewAI, Langchain, Llamaindex, OpenAI and PydanticAI, to name a few, all feature products that support multi-agent development. And while these have caught my eye, recently, this is by no means an exhaustive list.
I’m rather taken by PydanticAI and the new OpenAI Agents SDK (which has kept to the simple but powerful approach demonstrated by their earlier educational offering, Swarm). Both employ easy-to-understand approaches to agent definition and provide methods to arrange flows from one to another. I’ll use them in this article.
We are going to look at how multi-agent systems can be designed. Whether they be simple linear flows, hierarchical ones or some combination of the two, we must consider how much an agent should know about its predecessor and what controls the workflow. Does an agent need to know every step its predecessor took or just the final result? Is it an agent that decides which other agents to call, or is the workflow defined in normal program execution?
We’ll consider these problems with a little example code.
Why multi-agents
So, why is it better to use multiple agents instead of just one?
Let’s say you wanted to produce an executive report from raw sales data and some customer feedback. There are several tasks to be executed:
Structure the customer feedback to be more easily dealt with (in JSON format, for example);
Analyse the structured feedback to identify well- and poorly-performing products;
We can assume that the sales data is already nicely formatted, maybe as CSV, and from this, we can create some informative charts;
Sales analysis to produce a sales report;
Finally, to produce a report, we need to pull all of these things together into a single document.
All of this could be translated into a single prompt for an LLM and, if provided with the data, could make an attempt to perform all of the required steps.
There are at least two problems with this approach. The first is that long tasks are harder to write, and the LLM is more likely to get confused than when processing a simpler request — the chances of hallucination are likely to be higher, too.
Debugging the result is the second problem. Yes, you can look through all of the messages that a complex prompt will generate to find out where things have gone wrong. But, as with all development, simple tasks are easier to get right.
Much easier to break down the task into the steps that I outlined above, give them to different agents and use the output of one to inform the next.
This approach allows you to home in on one fairly simple problem at a time, and only when you’ve got that right is it time to move on to the next step.
How do I design a multi-agent system
There are a few questions that need answering when designing a multi-agent system, but possibly the most important is “Do I really need this?”. We’ve thought about what the advantages are but if you can solve your problems with a simpler system, then you should probably do so.
Multi-agent systems might be the flavour of the month, and they are a great thing to learn about, but when it comes to practical solutions, simple is best, and there may be a better way.
It might, after all, be perfectly feasible to solve your problem and get consistent results with a single prompt.
But, let’s assume that you are convinced that multi-agents are the way to go. Here are some other questions to answer.
Loosely or tightly coupled
Do the agents need to be tightly connected? We could write a program that invokes an agent, take the final output of that agent and feed it into another. We might term this arrangement as a loosely coupled agent system. This assumes that the downstream agent does not need to need to know much about what the first agent did. For example, say I wanted to find the population of the capital of France:
User: “Hi Agent1 can you tell me the capital city of France?”
Agent1: “Certainly, the capital of France is Paris”
# Some program logic to extract the answer and build a new request
User: “Hi Agent2, can you tell me the approximate population of Paris, France?”
Agent2: “Certainly, the population of Paris, France is approximately 2 million people.”
Diagrammatically, we can represent the programmatic flow something like this:
This is a very trivial example, of course, and we are assuming that between the first and second requests, there is some program logic that will extract ‘Paris’ from the first response and formulate the new request that asks for its population.
Here is an implementation using PydanticAI. The processing of the first output is simply to extract the final response to the query and embed it into the next prompt. We use the same generic agent both times as no specialisation is required.
from pydantic_ai import Agentagent = Agent(
'openai:gpt-4o-mini',
system_prompt='Be concise, reply with one sentence.',
)result1 = agent.run_sync('What is the capital city of France?')
result2 = agent.run_sync(f'{result1.data}. What is the population?')
print(result2.data)
# result1: The capital city of France is Paris.
# result2: As of 2023, the population of Paris is approximately
# 2.1 million in the city proper, with around 11 million in the
# larger metropolitan area.
Alternatively, we could design a more tightly coupled system that shares the context from the first agent with the second one. Here’s some pseudo-code that represents this:
# Pseudo code
# Add a user request to the context
context.add("Hi Agent1 can you tell me the capital city of France?")
# Call the agent with that context
response = agent1.call(context)
# Create a new context from the response
new_context = response.context
# the new agent knows about the context from the first agent
# So it understands that the next request is about Paris
new_context.add("And what is the population")
response = agent2(new_context)
To be clear, when we refer to the context, we mean the list of messages that is produced by the user and the LLM as a result of a user request — the content of the LLM’s context window.
We start by setting the context to a request to find the capital of France. Calling the agent with this context will produce a response like before, and the context will be updated with messages produced by the agent, including the final response.
Now, instead of extracting the required information from the response, we copy the context and add a new request to it: “And what is the population?”. We call a new agent with this context, and because the context contains the history of the previous call to the LLM, the question is understood to refer to Paris and agent2 will respond accordingly.
The flow of events looks much like the one above but the difference is that we use the whole of the previous context so that the the new request is understood.
We can implement a loosely coupled system in pretty much any agent framework. While it shouldn’t be an impossibility to implement a tightly coupled system, either, the technique is implemented directly in AI agent frameworks such as OpenAI’s Swarm and Agent’s SDK, PydanticAI, CrewAI, etc. — although the implementations vary.
Here is an implementation. Again, we are using PydanticAI.
from pydantic_ai import Agent
agent = Agent(
'openai:gpt-4o-mini',
system_prompt='Be concise, reply with one sentence.',
)
result1 = agent.run_sync('What is the capital city of France?')
result2 = agent.run_sync('What is the population', message_history=result1.new_messages())
print(result2.data)
# As of 2023, the population of Paris is approximately 2.1 million people.
PydanticAI allows us to pass the message history to an agent (it is a parameter in the run), and we set this to the new_messages()
method of the first result - this gives us the entire list of messages in the context. PydanticAI assumes that if we set this parameter, then that is the only context required and so does not set a system prompt. I can imagine cases where this might not be appropriate but there will be ways around this, I'm sure.
Manager agents
Both techniques we have described, so far, are linear: one agent does some processing, then the next one is invoked to do something else, and so on until a final result is achieved.
Using standard programming techniques to order the linear execution of agents is straightforward, but it is not always that simple. Sometimes we need a hierarchy of agents, and in this case, we might prefer to use the intelligence of an LLM to decide just how to order agent execution.
Take a customer support triage system, for example. Let’s imagine that I want some advice on how to deal with file systems from the command prompt. I ask the customer support system to advise me on how to execute a file from the command line, say. Before it can answer the question, it needs to know what operating system I’m using. So, the first response will be to ask me if I’m using Windows, MacOS or Linux. Then, depending on my answer, it will pass me on to the correct agent, who will give me properly tailored answers.
That’s a hierarchical architecture where one agent passes control to one of three other agents.
We could implement this by getting a response from the Triage agent, processing it programmatically, and then calling the appropriate OS agent.
However, there is a neater solution: using agents as tools.
Agents as tools
We know that agents can use tools, i.e. functions that are implemented locally that do things that an LLM cannot. A tool is a locally defined function, so why not define a function that runs an agent?
This technique gives us a great way of designing the triage agent. Define agents for the three operating systems and let a triage agent make the decision about which of them to use.
Here, again, is a PydanticAI implementation. PydanticAI cannot use a call to an agent directly as a tool, we need to wrap that in a function. In the code below, we first create a specialised agent class that will serve to create the three OS agents (this is just a code-saving device). When an agent object is instantiated from this class, it customizes the system prompt for the OS in question.
Following this definition, we create the three OS agents and, for each one, define a function that can be used as a tool to run the agent. We also define a user input tool that the triage agent will use to prompt the user.
from pydantic_ai import Agentclass OS_Agent(Agent):
def __init__(self, name = "windows"):
self.name = name
super().__init__(
model = 'openai:gpt-4o-mini',
system_prompt=f"You are an expert in the {self.name} operating system",
)
win_agent = OS_Agent("windows")
def call_win_agent(prompt:str)->str:
return win_agent.run_sync(prompt)
macos_agent = OS_Agent("macos")
def call_macos_agent(prompt:str)->str:
return macos_agent.run_sync(prompt)
linux_agent = OS_Agent("linux")
def call_linux_agent(prompt:str)->str:
return linux_agent.run_sync(prompt)
def user_input(prompt:str)->str:
return input(prompt+" ")
agent = Agent(
"openai:gpt-4o-mini",
tools=[call_win_agent, call_macos_agent, call_linux_agent, user_input],
system_prompt="""
Use the 'user_input' tool to get data from the user.
First, ask for the operating system and then the user query.
Select the correct tool from
'call_win_agent',
'call_macos_agent' and
'call_linux_agent',
depending on the user's operating system
and record its response""",
)
result1 = agent.run_sync("I am the OS query agent, please input the following:")
print(result1.data)
This works by defining a set of tools that run the appropriate agents and instructing the Triage agent to use the correct one depending on the operating system that the user mentions.
I’m mainly using PydanticAI in these examples because it is model-agnostic — although I use OpenAI here PydanticAI supports many different models.
However, it is worth noting that the Agents SDK from OpenAI simplifies this process by eliminating the need to define functions for the handoffs. Here is the implementation of a similar set of agents using that API.
# Using the OpenAI Agents SDK
from agents import Agent, handoff, Runner, function_tool
class OS_Agent(Agent):
def __init__(self, name = "Windows"):
self.name = name
super().__init__(
name = self.name,
model = 'gpt-4o-mini',
instructions=f"""You are an expert in the {self.name} operating system.
Always precede your response with '{self.name} Agent: '.""",
)
linux_agent = OS_Agent("Linux")
macos_agent = OS_Agent("MacOS")
win_agent = OS_Agent("Windows")
@function_tool
def user_input(prompt:str)->str:
return input(prompt+" ")
triage_agent = Agent(name="Triage agent",
handoffs=[win_agent, macos_agent, linux_agent],
tools = [user_input],
instructions="""
Use the 'user_input' tool to get data from the user.
First, ask for the operating system and then the user query.
Select the correct handoff from 'win_agent', 'macos_agent' and
'linux_agent', depending on the user's operating system
and record its response""")
result = await Runner.run(triage_agent, "")
print(result.final_output)
As you can see, the idea is much the same, but the implementation is a little cleaner and shorter than with PydanticAI. No wrapping functions are necessary, so the agents are defined in just three lines of code — neat!
Finite state machines
Some providers (LangChain, LlamaIndex and PydanticAI, for example) have products which claim to support the construction of complex multi-agent systems through the use of Finite State Machines (FSM).
Although they appear popular with framework providers, I have reservations about using FSM to design multi-agent flows. State machines are great for describing event-driven or real-time software but for multi-agent systems? I have grave doubts.
Most documentation that supports these products gives examples of simple systems and depicts them as graphs. However, these graphs are not, in fact, FSMs. They’re often a mishmash of different ideas but are generally more like Data Flow Diagrams than State Diagrams.
PydanticAI uses a building analogy in their documentation; if using agents is, they say, like using a hammer, then multi-agents are more like having a sledgehammer. But graphs are the equivalent of a nail gun.
I guess they are suggesting that the graph-based approach is very powerful but doesn’t really do anything more than a hammer. I would put it differently. Graphs are like using a power drill to hammer in a nail; the drill is a powerful tool that can be usefully put to work in a number of circumstances and while it is heavy enough to drive in a nail it is not really made for that job.
As I am not convinced of their utility, I won’t cover the FSM approach here, but I may have more to say on this subject later.
Conclusion
We have seen how it is fairly straightforward to implement simple multi-agent systems in linear and hierarchical flows. I’ve stuck pretty much to PydanticAI examples, as it is a simple but powerful framework. Other frameworks have different takes on how to approach multi-agent flows although OpenAI and PydanticAI seem to share a similar simple approach.
It’s an approach that I appreciate. Both downplay the graph method of defining flows: OpenAI does not seem to have such an offering at the moment, and while PydanticAI does have such a product, they warn that it is not for beginners.
By combining linear and hierarchical flows, using handoffs to other agents and giving a managing agent responsibility for invoking those other agents and tools, you are implementing the essential programming concepts sequence, selection and iteration. Using these primitives, you can solve any programming problem and so you should be able to implement any multi-agent system. Good luck!
Thanks for taking the time to read this. I hope that it has been worthwhile and that you have found it useful.
Find the code here: GitHub repository for the code.
To read more of my stuff, subscribe to my Substack or follow me on Medium.