0% found this document useful (0 votes)

102 views59 pages

Emerging Patterns For Building LLM-Based AI Agents

Building large language model (LLM)-based AI agents presents a new set of challenges for software architects, designers, and developers, even though these agents have great potential. To address these issues, a set of emerging patterns for AI agent architecture, design, and operations has been identified, which architects can use to deliver and ensure the robustness of these new capabilities. While these patterns are continually evolving, they provide a framework for creating more effective and

Uploaded by

Minh Pham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

102 views59 pages

Emerging Patterns For Building LLM-Based AI Agents

Uploaded by

Minh Pham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Licensed for Distribution

This research note is restricted to the personal use of Duy Le Dinh (duyld10@fpt.com).

Emerging Patterns for Building LLM-Based AI Agents

31 January 2025 - ID G00823011 - 82 min read

By Gary Olliffe, Steve Deng

Initiatives: Software Architecture and Integration for Technical Professionals and 2 more

Agentic AI presents the ability to create automated solutions using LLMs to drive execution

of tasks and processes. Software architects must learn an emerging set of patterns for AI

agent architecture, design and operations to enable the delivery and ensure robustness of

these new capabilities.

Overview

Key Findings

AI agents and the technologies that support their implementation and operation span a broad

spectrum of capabilities, sophistication and maturity. This complicates the planning,

understanding and delivery of useful solutions.

AI agents that use large language models (LLMs) for planning, reasoning and processing

provide a useful and accessible alternative to more established types of AI agents. LLM-

based agents use both programmed and prompted behaviors that require careful design,

evaluation and monitoring to ensure they are constrained to the desired outcomes and levels

of quality.

LLM-based AI agents can be implemented using a variety of technology frameworks and

platforms — including existing automation and orchestration technologies, dedicated

development frameworks and emerging agent platforms.

Many of the more advanced agentic AI patterns are unproven in production enterprise

environments. This increases the risk that prototypes and proof-of-concept solutions

encounter issues that prevent production deployment.

Recommendations

Software architects responsible for designing and delivering LLM-based AI agents should:

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 1/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Start simple by experimenting with a few patterns at a time. Begin with learning the

functional patterns before moving on to the operational patterns needed for robust,

production-ready AI agents.

Build agent evaluation capabilities as part of your LLM-based agent implementation to

ensure you can establish production-grade trust in its behavior. The inability to validate the

behavior of AI-based solutions is the most common blocker to deployment.

Use “agent architecture” and “agent action” patterns to provide clear modularity and flexibility

within your solutions. Modular evaluation and testing of agent components is an important

foundation for maintaining overall agent performance and trust.

Implement AI agent patterns using existing automation, orchestration, integration and

development tools to leverage existing investments, skills and assets, where possible. Many

of the agent patterns are implemented by combining structured LLM prompts, with API-based

interactions, datastores and tools or services.

Analysis

What Are LLM-Based AI Agents?

In 2024, AI agents and agentic AI emerged as the latest must-have capability in the market for

platforms seeking to support or enable AI-based business solutions. AI agents are

differentiated from AI assistants and chatbots by their ability to autonomously plan and act to

meet a user-provided goal. Gartner’s definition captures this distinction:

AI agents are autonomous or semiautonomous software entities that

use AI techniques to perceive, make decisions, take actions and achieve

goals in their digital or physical environments.

— Innovation Insight: AI Agents¹

The patterns explained in this research enable the creation of AI agents using LLMs. LLM-based

AI agents combine software orchestration and language-based, or multimodal foundation

models, to implement this type of behavior. The patterns captured in this research are for

implementing AI Agents that use LLMs for some or all of the following:

Processing goals

Planning actions

Selecting (and generating inputs to) relevant tools

Interpreting data

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 2/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Evaluating progress

Generating output data including text

The software architecture of LLM-based AI agents orchestrates the interactions between input

interfaces (including users or other system components), one or more LLMs, “memory”

(persisted state) and external interfaces (e.g., tools and other agents), as shown in Figure 1. The

owner is shown separately in the diagram, as it may be someone other than the end user who

interacts with the agent, or there may not be a user at all.

Figure 1: Simplified Architecture of an LLM-Based AI Agent

There are many other types of AI Agent architecture, not covered in this research, including

autonomous vehicles, robotic swarms, game-playing agents (e.g., Google [DeepMind] and its

products like AlphaGo, AlphaStar and Scalable Instructable Multiworld Agent [SIMA]) and
2
reinforcement-learning agents.

Beware of Agent-Washing

3 4 5
In the latter half of 2024, many technology vendors (e.g., Salesforce, Microsoft, Google,
6
ServiceNow ) have started using the term “AI agents” to describe a broad spectrum of

capabilities, including renamed AI assistants and chatbots. This dilution of the term is primarily

driven by marketing, motivated to capture imaginations and attention.

You need to be wary of “agent-washing” when evaluating tools and technologies to help your

organization deliver business solutions by building AI agents. You must understand a set of key

patterns that will enable them to implement key agent characteristics and capabilities so they

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 3/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

can judge which tools support the level of agentic AI that their requirements need. Of course,

the same patterns also provide a framework for designing and delivering AI agents more

effectively. “Agent-washing” will continue but the level will vary as vendors add new agentic

features, new products enter the market, and the hype around agents ebbs and flows.

In addition to understanding the implementation patterns, it is also valuable to understand what

makes something an AI agent. You should also be mindful that agents are an emerging

technology category, and not every problem is best solved by an AI agent.

Characteristics of AI Agency

The word “agent” has multiple meanings in both business and technology, some of which have

nothing to do with AI. To help us focus on what we mean by an AI agent today, we have

identified a set of core characteristics that help describe what it means for software to have or

support “agency.” These characteristics are themselves ranges, and lacking one or more of

them entirely does not mean an AI agent is any less useful if it is well-designed and

implemented for its intended purpose.

We can plot these characteristics on a spider/radar plot, as shown in Figure 2. The larger the

area of the plot for an AI agent platform, tool or an implementation of an AI agent, the more

“agentic” it is.

Figure 2: Characteristics of AI Agents

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 4/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

The characteristics we have identified are:

Role generalization: Agents tend to have functional behavior described in the context of a

specific personality, role or persona. A logistics agent expected to accept inputs on any and

all topics or activities would have a highly generalized persona, while a sales lead validation

agent would have a less general persona.

Proactiveness: More advanced AI agents are proactive, seeking additional information from

the user or tools as required to meet their goal, rather than being only reactive to user input

and guidance.

Planning: Agents can have the ability to “reason” about how best to achieve the goal within

constraints defined by the developer, platform and environment to form a plan, breaking the

problem into tasks. In LLM-based agents, this planning activity is typically achieved by

prompting the LLM with the goal and appropriate context. Advanced agents will assess their

progress toward a goal and reevaluate their plan (see “Goal seeking”). Simpler agents often

follow a plan prescribed at creation time by the developer, which leads to more predictable

behavior, but reduces adaptability.

Autonomy: Agents can take some or all actions required to achieve the goal without human

guidance — note that human review and approval of actions does not preclude autonomy.

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 5/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

The more human input and guidance required to meet a goal, the less autonomous the agent

is.

Goal seeking: Agents are able to accept direction in the form of a goal or desired outcome,

rather than explicit instructions. Planning and acting are then used to meet the goal.

Acting (sensing): Agents can use tools (in the form of functions, APIs, actions and code

evaluation) to retrieve information about the environment they operate in. This sensing is

side-effect free, and can help an agent to provide accurate, timely and relevant information to

the user, or to assess its own behavior or progress toward a goal. Sensing may require

authentication and authorization of the agent, and the user or system that the agent is

operating on behalf of.

Acting (effecting): Agents can use tools (in the form of functions, APIs, actions and code

evaluation) to take actions that have an effect on the environment they operate in. The

effects that an agent can have define its behavior or ability to progress toward a goal. Acting

always requires authentication and authorization of the agent, and the user or system that

the agent is operating on behalf of.

Learning (behavioral memory): Agents can learn from their past activities by using memory

to record actions that had both positive and negative outcomes (e.g., plan structures that

lead to successful outcomes for certain goals, and those that failed).

Memory (facts/context): Agents can retain information that influences the planning and

action taking behaviors, including short-term memory about the current task, and long-term

memory that spans multiple tasks and may be shared across users or within a domain.

When designing agentic AI solutions, it is not essential to maximize, or even include, all of these

characteristics. You should focus on the requirements and implement those characteristics that

deliver the optimal solution in terms of capability, complexity and adaptability. The best solution

is often the simplest.

Don’t aim for high agency unnecessarily, aim to solve the problem you

have identified.

Build LLM-Based AI Agents With a Modular Architecture

Building agentic software using LLMs requires a modular and composable approach to the

software architecture. Many of the agent characteristics above are integrated into the solution

as discrete technical components or services (e.g., memory, sensing, effecting and access to

the LLM(s) are commonly implemented as discrete, and distributed component services). The

software architecture of LLM-based agents is relatively simple at its core:

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 6/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

1. Inputs, context retrieved from memory and tool definitions are consolidated into a prompt

2. The prompt is processed by an LLM which generates output, including tool or function calling

requests

3. Tools are called and resulting information is added to prompt context

4. The new prompt is processed by an LLM, which then generates a new set of outputs that

may continue the process by returning to Step 2 or returning a result (e.g., generated output

for the client process, or confirmation of successful or failed processing to meet a goal)

This orchestration process is commonly managed by the tools and frameworks used to develop

LLM-based agents, simplifying the development effort and complexity. However, there are still

many problems to solve in creating an impactful solution. The software architecture must

address the nondeterministic nature of LLMs, the unstructured nature of language-based

prompts and responses, security and privacy concerns, and delivering on business goals. These

challenges are described in Note 1: The Key Challenges of LLM-Based AI Agents. The patterns

described in this research will help you design LLM-based AI agents that address these

challenges and deliver more effective solutions.

Patterns for Implementing LLM-Based AI Agents

The emerging patterns described in this research are grouped into the following domains:

Agent architecture patterns: This domain encompasses the structural design patterns for AI

agents, including both single- and multiagent modularity. The architecture determines how

agents interact, communicate and fulfill their designated roles within a system.

Agent process patterns: This domain describes patterns that ensure agents have a well-

defined flow of activity, which can be linear, looping, dynamic, single- or multithreaded. This

ensures systematic and efficient task performance.

LLM interaction patterns: The patterns in this domain describe a variety of prompting and

processing patterns that use the capabilities of an LLM to collect information, refine a plan

and validate status. This group includes prompting and LLM API calling patterns.

Agent action patterns: This domain includes patterns that allow agents to take actions based

on their interpretation of inputs, sensing of their system environment and decisions. These

patterns include function calling, tool selection and integration with the environment.

Agent memory patterns: These patterns for defining and managing memory are crucial for

agent performance, and include short-term memory (within task), shared memory (across

tasks per user), and global memory (across all use of the agent). Proper memory

management keeps the agent grounded and enhances personalization and accuracy.

Memory related to decisions, actions and activity can be used to help the agent learn and

adapt.

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 7/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Agent evaluation patterns: These patterns can be used to implement continuous evaluation,

validation and improvement for maintaining high-quality agent performance. This includes

techniques such as user-in-the-loop feedback, performance evaluation, logging, and

red/purple teaming to assess and enhance agent behavior.

Security and identity patterns: This domain focuses on patterns for ensuring security and

identity management in agent systems. This includes LLM guardrails and identity

propagation to prevent misbehavior and ensure authorized operations.

Figure 3 shows the pattern categories and their supporting patterns.

Figure 3: Patterns for Building LLM-Based AI Agents

In the following sections for each pattern domain, the patterns are described in detail and the

suitability, pros and cons are provided, along with examples of the tools or technologies that

can be used to implement them.

Agent Architecture Patterns

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 8/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

These patterns are used to guide the overall architecture and design of LLM-based agents. Your

choice of agent architecture patterns will be driven by factors including the scope of the

capabilities required for your agent and the frameworks or platforms you intend to use to

develop it. They also affect agent performance (speed of response), maintainability and

adaptability of the agent and the viable approaches for agent evaluation and quality. The agent

architecture patterns are:

Pattern: Solo Agent

Pattern: Agent Roles

Pattern: Agent-to-Agent Handoff

Pattern: Multiagent Modularity

Pattern: Solo Agent

The solo agent pattern describes agents that are atomic in structure, composing all of their

technical components (including the use of other patterns) into a single monolithic

implementation. A solo agent is typically implemented using a simple orchestration framework

or platform that supports the creation of the necessary agent interfaces for your use case (e.g.,

conversational UI, APIs or integrations with other platforms such as enterprise collaboration

tools) and the definition of the agent role and constraints.

Suitability: Simple automation of specific tasks (e.g., known or expected planning structure that

can be captured in a prompt).

Pros:

Solo agents can be implemented using a variety of platforms, tools and frameworks

They do not require specialist tooling or complex frameworks for agent-to-agent handoffs

and flow control if deployed in isolation

Solo agents are simple to monitor and make it easier to trace behavior of a process or task

Multiple solo agents can be combined in multiagent architectures

Cons:

Lack of modularity as the required scope and capability of the agent increases creates

prompt complexity and regression risk when making changes

LLM-based tools selection becomes less reliable when many tool options are offered to the
7
underlying LLM (see Function Calling [OpenAI])

Limits adaptability and extensibility of the agent to new tasks and domains

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 9/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Examples:

Google ( Vertex AI Agent Builder)

LangChain ( LangGraph)

Microsoft Prompt Flow

Snaplogic AgentCreator

Pattern: Agent Roles

Sometimes referred to as “agent personas,” agent roles are a design pattern for defining the

scope and behavior of agents. Agent personas apply the anthropomorphism associated with

LLMs and generative AI (GenAI) to deliberately describe the behavior of an agent as perceived

by its user or client.

To use this pattern, your design process should describe the behavior (and limitations) of the

agents as if it were a person taking on a specific role or task. This description becomes part of

the prompt context passed to the LLM at inference time to shape and guide its responses. The

agent role pattern alone is not enough to enforce the behavior of the agent, and it must be

augmented with other validations and protection to ensure consistent and expected behavior

(i.e., protecting against task deviation, prompt injection or hallucination through ambiguity in the

prompt).

This pattern is also commonly applied in multiagent architectures where having clearly defined

roles, responsibilities, capabilities and personalities (e.g., brevity, tone, assertiveness) for each

agent helps with overall problem decomposition, and can be used to guide agent-to-agent

handoffs (see Pattern: Agent-to-Agent Handoff).

Suitability: This pattern is widely suitable for LLM-based AI agent design, it provides a

repeatable and easy-to-apply structure to the design process and supports the design of AI

agents that have a manageable and testable scope of capability. The alternatives to this pattern

include instruction- or task-based design that define the agent by describing the work to be

done, rather than the role doing the work.

Pros:

This pattern embraces, rather than fights against, the anthropomorphism of LLMs. This

simplifies the process of working with users, or translating user requirements, into a

definition that can be used for both documentation and (when embellished with refined

prompts) deployment.

The pattern is consistent with describing the characteristics of an AI agent (or subagent) and

aligns well with many of the other pattern categories, including action-taking patterns and

memory patterns.

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 10/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Different role descriptions (via the role prompt) “condition” the LLM to respond differently.

LLMs are trained on vast amounts of diverse data, and this conditioning prioritizes parts of

the distribution that are relevant to the tasks. For example, “you are a PhD in computer

science and you are looking to improve the system design” will influence the response toward

tokens associated with computing expertise.

Agent roles can be used for nonfunctional behaviors to improve the quality of the agent

system. For example, the judge role can evaluate the results of other agents, ensuring task

completion accuracy.

Cons:

The “role” of an agent may not map directly to an specific or equivalent human role. This

makes description harder and risks model confusion and hallucination if the roles are not

defined in the model training data (e.g., imagined or fantasy roles are used).

Designers must understand the limits of each role’s responsibilities, and ensure that the role

definition is rich and explicit enough to direct the correct behavior. Preventing an agent

defined for a specific role from going outside its intended purpose can be challenging. See

the guardrails pattern for an approach to support this.

Examples:

Microsoft AutoGen conversible agent roles

AI Agent Library, Beam

CrewAI role-playing agents

Pattern: Multiagent Modularity

The multiagent modularity design pattern decomposes the capabilities of an agentic AI solution

into a set of cooperating and coordinated subagents. Each of these agents has a specific agent

role and set of capabilities that is known to one or more of the other agents in the system, as

shown in Figure 4. The flow of interactions between the agents is guided by one or more of the

agent process patterns.

Agents use agent-to-agent handoff to identify and delegate tasks to each other in pursuit of the

main goal of the system. Note that some agents in a multiagent design may be implemented

using more traditional AI or rule-based approaches, rather than being LLM-based.

Figure 4: Multiagent system

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 11/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Suitability: This pattern introduces modularity that allows greater flexibility, scalability and

robustness, making it suitable for more complex, dynamic environments. Multiagent modularity

is suitable for requirements that must offer the user or client of the agent a unified interface

that spans a number of discrete domains and capabilities. Additionally, multiagent modularity

can be used to decompose the complexity of a single domain task or process.

For example, an AI marketing communications agent could be decomposed into a set of

subagents including a “channel research agent,” a “influencer identification agent,” a

“copywriting agent,” a “posting agent” and a “postpromotion agent.” At a later date a “posting

agent” could be added to the system to automatically promote new blog posts by posting

automatically extracted summaries or highlights in community or social channels. An

autonomous implementation of this would plan the best promotional activities based on the

content and potential audience.

Pros:

The multiagent modularity pattern enables complex problems to be decomposed into

smaller, more manageable steps. This simplifies the creation, testing, monitoring and

troubleshooting of each subagent.

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 12/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Individual subagents can be simpler and more precisely defined, increasing reliability and

simplifying the interaction with other resources. For example, limiting the number of function

calling choices per subagent.

The pattern supports composability and extensibility of agentic systems while limiting the

impact on existing components. For example, allowing changes or optimizations to be

isolated in one subagent reduces regression risk and testing of other subagents, and

simplifies roll-back of changes if required.

As agentic systems grow, modularity will allow for greater flexibility in team structures,

allowing different teams to focus on, and own the delivery of specific subagents.

Cons:

Performance (time to respond) of multiagent systems may be reduced as each subagent will

have its own planning, LLM prompting and action processing, and interactions between

subagents may slow the processing of a user’s goal.

While individual subagents can be simplified, a multiagent system requires more complex

development, monitoring and management tooling. For example, rather than implement and

manage a single set of guardrails for one agent, each subagent will have its own guardrails

that must be defined, verified and maintained.

When using a more free-form autonomy pattern such as collaborating agents, you must

guard against looping and race conditions that could result in slow performance, costly LLM

token usage or unresponsive processing. Using a more prescriptive process pattern such as

orchestrated agents can mitigate this.

LLM token usage and associated costs may be higher since each subagent will need to

prompt and process their own context, system prompts and inputs.

Examples:

Multi-Agent Systems (LangChain [LangGraph])

Multi-Agent Conversation Framework (Microsoft [AutoGen])

Multi-Agent Conversation Framework (AG2) (Note: AG2 is a fork of AutoGen 0.2)

Crews (CrewAI)

Pattern: Agent-to-Agent Handoff

Multiagent systems (MAS) consist of multiple interacting agents. This modularity allows for

greater flexibility, scalability and robustness, making it suitable for complex, dynamic

environments. When the tasks of an agent require capabilities beyond its scope the agent can

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 13/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

decide to hand off (delegate) further processing to another agent. This delegate agent may

hand off control back to the original agent when its processing is complete, or decide to hand

off further processing to yet another agent. At each hand off, the delegating agent should

provide appropriate context about the current task to the delegate.

Agent-to-agent handoffs require a clear definition of the interface between agents, and a means

of defining the existence of other relevant agents and their capabilities to each agent that may

use a handoff. The capabilities of the agent can be defined as part of the agent role and

handoffs can use function calling or API tool use patterns.

Suitability: Agent-to-agent handoffs are an essential pattern in multiagent systems designs that

do not use a prescribed order of processing for the required task — the result of a set of

handoffs can be viewed as a “choreography” between agents. The pattern is also suitable when

a more dynamic flow of control is required (e.g., based on additional user inputs or the result of

sensing external data).

Pros:

The agent-to-agent handoff pattern provides a structured approach to support collaboration

between discrete AI agents in the pursuit of a goal.

The handoff pattern can be implemented using existing function-calling or tool-use patterns

with an opinionated definition of how context is exchanged.

The handoff feature is available as a feature of AI agent frameworks, primarily to support

local (nondistributed) agent processes.

The available handoff paths for an agent can be statically defined by the developer to ensure

inappropriate flow of control is prevented (e.g., allow a “writer agent” to communicate within

“editor agent,” but not directly with a “publisher agent”).

In distributed environments, the handoff process can be implemented as a synchronous API

call or an asynchronous “command” message.

Cons:

Agent-to-agent handoffs in distributed agent environments introduce additional complexity.

This includes:

Reliable communication of context and intent, for example using message broker

technology or shared memory

Consistent monitoring and logging (e.g., using distributed tracing and log aggregation

technologies to provide a complete view)

Identity propagation or delegation between agents

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 14/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Ensuring agents hand off to the correct agents, in the correct circumstances, with the

necessary context

Natural language conversation between agents may lead to behavior and flow that is

challenging to predict and evaluate

Error and exception handling for the overall goal must be coordinated such that each agent is

able to communicate failure to the initiating user or process

Examples:

Sequential Processes (CrewAI)

Orchestrating Agents: Routines and Handoffs (OpenAI Cookbook) (experimental)

Handoffs (Microsoft AutoGen) (experimental)

Agent Process Patterns

Return to top

The patterns in this domain are used to define agents’ flow of activity to complete a task or

process. This flow can be linear, looping or dynamic and the patterns chosen should ensure

systematic and efficient task performance. The more dynamic the processing pattern, the more

“agentic” the solution can be, however, an explicit prescribed process structure is often the

simplest and best approach to meeting well-defined requirements. Design, evaluation and

security implementation are significantly simplified by using static process definitions. The

patterns described in this domain are:

Pattern: Prescribed Plan

Pattern: Mulithop Question Answering (MHQA)

Pattern: Dynamic Plan Generation

Pattern: Orchestrated Agents

Pattern: Collaborating Agents

Pattern: Human-in-the-Loop (Agent-to-Human Handoff)

Pattern: Prescribed Plan

In many automation use cases, the steps required to complete a task are well-known at design

time based on user requirements. When this is the case, prefer using a prescribed plan to

ensure a repeatable and more easily testable and observable flow of control for an agent. This

pattern is typically implemented using well-established process automation or workflow

approaches, such as integration and automation platforms. LLM interactions can be included in

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 15/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

the flow definition to support features, such as extraction of user intent, decision logic and

flexible analysis of data sensed from external sources.

Suitability: Prescribed plans are most suitable for highly repeatable autonomous processing.

This pattern reduces the level of agency of the solution, which in turn makes the behavior of an

agent easier to constrain, test and validate. This pattern is also commonly used in nonagentic

AI assistants and is analogous to process orchestration in traditional automation solutions.

Pros:

Uses established process automation design patterns and technology to implement

autonomous behaviors and avoid unexpected behaviors.

Design processes are highly prescriptive, making the development and debugging process

more familiar to developers.

Since the “plan” used by the AI agent is declaratively defined at design time, this pattern

increases confidence that agent behavior be predictable and reduces the complexity of

implementing guardrails. However, any LLM interactions to support flow control or

information processing will still need suitable guardrails and evaluation to ensure desired

behavior.

Cons:

Reduces agency, and all but eliminates the dynamic adaptability of an agent to unexpected

user inputs or sensed data.

Requires deeper analysis of all requirements the agent is expected to support and explicit

development of each process and task the agent is expected to support.

Process definitions are typically implemented using platform-specific process definition tools

and configuration. This typically locks the resulting solution to the development platform.

Examples:

Pipelines (deepset [Haystack])

Flows (Microsoft prompt flow)

Workflows (SmythOS)

Workflows (Restack)

Pattern: Multihop Question Answering (MHQA)

MHQA is a pattern for reasoning that can help LLM-based solutions provide correct answers to
8
questions with complex logical structure. For example, questions where the answer can only
https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 16/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

be derived once the necessary contextual information has been retrieved and used as context.

In AI agents, this pattern can be used within the processing of a single agent. For example, this

includes using context retrieved from agent memory or via actions that retrieve context from

external sources before deriving the final answer by passing this context and the question to the

LLM.

This pattern can also be used in multiagent implementations by breaking down complex

requests into simpler subtasks, These subtasks can then be delegated to different agents roles

before the aggregated results are used to form a comprehensive response. Note that “multihop”

refers to the possibility that an answer can require more than one cycle of context retrieval and

answering to get to the best or correct answer.

Suitability: The MHQA pattern is most suitable when your agent is expected to respond

accurately to complex questions that can only be answered by combining information from

multiple sources.

Pros:

Enables reasoning that can answer complex questions more effectively using smaller or

more cost-effective LLMs.

Can be used with additional context sourced from RAG, function calling or agent memory

patterns to augment the knowledge embedded in a pretrained or fine-tuned LLM.

Reduces hallucination compared to answering complex questions in a single prompt.

Cons:

Can increase processing time and token consumption over one-shot prompting.

Can cause unexpectedly long chains of hops that can slow response performance and

increase cost. Use retry limits to mitigate this risk.

Cannot fully eliminate the risk of incorrect or hallucinated results.

Examples:

Tutorial: Answering Multihop Questions with Agents (Haystack)

DQA: Difficult Questions Attempted (GitHub)

Pattern: Dynamic Plan Generation

Agents that need to adapt to a diverse set of goals require the ability to formulate a plan of

action based on known information and available tools (see Agent Action Patterns). Dynamic

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 17/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

plan generation allows agents to infer a plan for how to meet a goal at runtime, and to adapt the

plan as processing proceeds (see Reflexion).

The dynamic plan is commonly generated by prompting an LLM with the goal that needs to be

met, along with any known or relevant context and definitions of the tools available for use as

part of the plan. The plan may include steps to sense additional context from the operating

environment before taking action to complete the goal. The generated plan is a set of steps

formatted in a way that the orchestrating software process can execute the plan. The plan may

include tasks that are delegated to specific agent roles or be handled as a set of steps by a

single agent using function calling or other tools.

Suitability: The dynamic plan generation pattern is best suited for requirements where the user

or client process goals cannot be defined at design time. This is relevant where the agent is

intended to support tasks that combine information and actions from multiple domains based

on the needs communicated by the user. For example, a software engineering agent given the

task of resolving a new error could formulate a plan to use source code search, change history

from a code repository, error logs and documentation of various services. When dynamic

planning is combined with multiple agent action patterns, including interactions with APIs, UI

and code evaluation, agents can complete very complex tasks with very little prescriptive code
9
or configuration. However, any implementation must take the risk of side-effects, or

unanticipated manipulation of the environment very seriously.

Pros:

Dynamic planning allows for the creation of agents that can process unique goals provided at

runtime, within the scope of capabilities defined by the agent developer. This gives you the

flexibility to implement agents that deal with a wider variety of situations.

Generated plans can be assessed, monitored, and analyzed both pre- and postexecution to

optimize performance and repeatability.

Plans that prove successful for meeting specific kinds of goals can be memorized by the

agent and used as context to guide future requests.

Cons:

The planning capabilities of LLMs remain limited for structured and constrained
10
environments. Examples today focus on planning more subjective or creative tasks. In any

case, the quality and relevance of an LLM-generated plan is critical to successful agent

behavior.

Dynamic planning results in process execution that is not prescribed in any software or

configuration code, making preproduction audit and assessment of the complete capabilities

and functions of an agent impossible.

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 18/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Dynamic plans cannot be optimized for efficiency at design time, and may consume

significantly higher token counts than a prescriptive process for the same task.

Guardrails are critical and must be thoroughly engineered and verified to ensure agent

behavior remains within expected ranges.

Hallucinations and model-overreach can result in the plan containing actions that are

suboptimal or even harmful. Isolated execution environments are strongly recommended to

provide protection against unanticipated agent actions.

Examples:

Task Decomposition (AG2)

Planning (CrewAI)

Task Decomposition (Microsoft AutoGen)

Plan and Execute (LangChain) (experimental)

Pattern: Orchestrated Agents

Orchestrated agents is a pattern for multiagent modularity in which the interactions that can

take place between discrete agents are modeled by the developer as a directed graph structure.

The directed graph can be cyclic, allowing iterative looping behavior, or acyclic (a direct acyclic

graph [DAG]) where looping is not possible. In this pattern, each agent performs a specific task,

passing the output (and flow control) to another agent.

This structure enhances system predictability by ensuring a well-defined flow of information

and actions. Note that each individual agent can use dynamic plan generation to guide its own

process, but does not dynamically discover or delegate tasks to other agents.

Suitability: The orchestrated agents pattern is suitable for building agents that implement

semiautonomous behavior — each subagent can implement planning and execution logic using

actions available to it. These subagents operate within an orchestrated flow, defined by the

developer to ensure that agents follow an expected set of handoffs to meet their goal and

cannot deviate from that flow. This is most suitable for situations where the flow of activity is

consistent across usage and well-understood by the developer. In situations where there must

be no looping or cycles, a directed acyclic graph (DAG) should be defined.

Pros:

Allows the agent developer to define allowed and disallowed transitions between agents in a

multiagent environment. This provides the designer with more dependable control and

reliability over agent behavior.

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 19/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

With a known set of agent-to-agent handoffs, developers can more easily optimize the

structure and content passed between agents to ensure desired and consistent behavior.

Cons:

The orchestrated agents pattern constrains the “adaptability” of a multiagent implementation

and limits emergent capabilities that may be desirable in some use cases.

The challenges of process orchestration technologies apply. For example, the orchestration

logic can be a processing bottleneck under load, or changes to the process require new

deployment of the orchestration component or definition.

Examples:

LangChain ( LangGraph)

FSM — User Can Input Speaker Transition Constraints (Microsoft AutoGen)

Pattern: Collaborating Agents

In the collaborating agents pattern, multiple agents or subagents with different skills or

strengths and capabilities are combined to achieve common goals. The pattern is commonly

combined with the dynamic plan generation and agent-to-agent handoff patterns. This dynamic

planning of task distribution or delegation to other agents differentiates this approach from the

orchestrated agents pattern.

By using a dynamic plan rather than an orchestrated plan, the participating agents can

collaborate based on their knowledge of each other’s capabilities. The developer can constrain

the scope of collaboration by specifying which handoffs are possible between agents. Each

collaborating agent may use a different LLM that is the optimal choice for its capabilities.

Suitability: The collaborating agents pattern is best used in combination with the dynamic plan

generation pattern to implement requirements that must meet diverse goals (commonly user

defined via chat interfaces) that span the capabilities of a “team” of agents. These subagents

provide modularity to the system, simplifying the definition, enhancement and monitoring of

each. Due to the collaborative nature of the processing, it is common for the agent-to-agent

handoff to be based on a natural language interface.

Pros:

Supports flexible and dynamic composition of agent capabilities

Supports modular implementation and operations of complex and adaptable agent systems

Complementary to other agent process patterns

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 20/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Cons:

Reduces developer control over the flow of processing and can result in unanticipated

looping or confusion between subagents.

Testing and optimization requires detailed analysis of plan definition and interagent

communications.

Requires strong guardrails implementation to ensure collaboration remains within the scope

of allowable outcomes.

Examples:

Group Chat (AG2)

Crews (CrewAI)

Group Chat (Microsoft AutoGen)

Pattern: Human-in-the-Loop (Agent-to-Human Handoff)

LLM-based agent processing is nondeterministic. When user or client inputs are processed by

an LLM, even with the best prompt engineering and guardrails, your use case may require

human involvement. For example:

1. To provide additional context or guidance to the agent during processing

2. To review and approve or reject actions before they are completed

3. To take over response to the user from the agent in the case of an exit condition

Note that in examples one and two, the human is the primary user interacting with the agent

directly or indirectly. In example three, the human is not the primary user, but someone able to

take over the conversation or processing from the agent.

The user experience and design is a critical element when implementing the human-in-the-loop

pattern. The value delivered by an agent can be compromised or enhanced by how the hand off

between AI-based processing and human involvement is handled. See How to Develop Effective

User Journey Maps and How Generative AI Will Change User Experience for details.

Suitability: The human-in-the-loop pattern is suitable for requirements where trust in the

behavior (and effect) of the agent are essential to meeting user expectations or other

requirements. Use human-in-the-loop to deliver semiautonomous processing and to gather data

to evaluate and improve the effectiveness of the AI-driven elements of the agent process.

Pros:

Mitigates risks associated with nondeterministic processing in LLM-based agents

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 21/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Well-suited to chat and conversational user interfaces

Enables collection of additional data for feedback into the improvement of the agent

Cons:

Requires user attention, reducing autonomy, slowing task completion to “human time” and

reducing scalability to the capacity of humans contributing to the process

Complicates user session management and user experience design

Examples:

Human-in-the-Loop (LangGraph)

Controllable Agents for RAG (LlamaIndex)

Configure Agent to Request Information From User to Increase Accuracy of Function

Prediction (Amazon Web Services [AWS] Bedrock)

Human Input on Execution (CrewAI)

Allowing Human Feedback in Agents (Microsoft AutoGen)

LLM Interaction Patterns

Return to top

The patterns in this domain describe a variety of prompting and processing patterns that use

the capabilities of an LLM to collect information, determine and refine a plan of action, and

validate status or progress. This group includes prompting and patterns for managing calls to

LLM APIs. The LLM interaction patterns are:

Pattern: ReAct

Pattern: Chain of Thought

Pattern: Reflexion

Pattern: Structured Response

Pattern: Retry Limits

Pattern: ReAct

The ReAct pattern derives its name from using an LLM to support both reasoning and
11
actions. Reasoning prompts such as chain of thought are used to generate a plan of actions

to meet a goal defined by the user or client process. The application prompt typically provides

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 22/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

few-shot examples of how the model should structure its output (e.g., as a sequence of

“thoughts” and “actions,” with the actions being specific prompts or external function calling or

API tool use). As the plan is executed and actions are completed, the plan description of

thoughts and actions is augmented with the results of the actions and passed to the LLM to

reevaluate the progress against the plan.

Suitability: The ReAct pattern is suitable for agents that must process inputs that require

deeper reasoning and multistep planning to deliver satisfactory results. The multiturn nature of

the process increases the processing overheads and token consumption of the approach

because the planning prompt must be updated with the results of each completed action. Due

to its iterative nature, processing will take longer than simple LLM prompting. Take this into

account for your user experience, as it may be best suited to back-end headless use cases or

scenarios where users are not under time pressure for a complete or immediate result.

Pros:

The pattern has been shown to improve the performance of LLM-based processing on

complex information retrieval (e.g., MHQA) or multistep tool use scenarios

ReAct builds on other simpler patterns and doesn’t require any additional tools or

technologies beyond a suitable LLM with function calling capabilities

Cons:

ReAct requires prompt engineering and optimization to ensure the structure and examples

are optimized for the tasks your agent is intended to complete.

The pattern typically relies on the LLM to decide when the plan has been successfully

processed, which is subject to the usual cautions relating to LLM hallucination and

nondeterministic behavior.

Processing can fail when the LLM fails to structure the actions correctly (e.g., initial plan

generation using ReAct returns a plan as badly formed JSON and the agent cannot process

the actions in it). See the structured response pattern as a way to mitigate this issue.

Examples:

ReAct Agent — A Simple Intro With Calculator Tools (LlamaIndex)

How-to-Guides (LangGraph) and example ReAct PromptTemplate (LangChain [LangSmith])

Pattern: Chain of Thought

Chain of thought (CoT) is a prompting technique that instructs LLMs to “think step-by-step” to

improve their performance on complex reasoning tasks. Instead of asking for a final answer, the

prompt induces the model to “think out loud” by generating tokens to show its thought process
https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 23/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

(or rationale) as it works out the final answer. This prompting technique has been shown to

improve the model performance in terms of accuracy, transparency and interpretability across

multistep reasoning tasks when compared to direct input-output prompting.

CoT has its limitations. The LLM is susceptible to mistakes that occur in its intermediate

thought process or reasoning steps. When the LLM makes a mistake, it is unable to backtrack

to correct the mistake because it’s a forward-pass-only generative process. There are several

attempts to improve upon the original CoT at the expense of inference-time compute. The

notable methods are:

Self-consistency with CoT (aka majority vote): Ask the LLM to generate multiple reasoning

trajectories and select the most common answer at the end. This is built on the assumption

that there are multiple ways to arrive at the right answer and the probability of getting the

right answer is higher than the wrong one.

Tree of thoughts (ToT): Explore multiple reasoning paths recursively in the form of a tree

traversal to find a path that leads to the best answer. ToT performs an evaluation at each

intermediate step to gauge its progress and determine its next course of action to move

forward or backtrack as it traverses the reasoning space. Again, ToT trades even more

inference time compute and latency for a more thorough search of the reasoning space to

arrive at a better reasoning trajectory.

12 13
Recently published reasoning LLMs such as OpenAI GPT-o1, DeepSeek’s R1 and Alibaba’s
1 4
QwQ (Qwen with Questions) have embedded CoT capabilities in the training of the model to

improve their reasoning for complex tasks. These types of models, while typically slower to

respond and with higher inference costs, could prove more effective for agentic use cases in the

future.

Suitability: Suitable for tasks that involve a sequence of mathematical, commonsense and

symbolic reasoning and operation.

Pros:

CoT prompting improves the quality, accuracy and interpretability of LLM response without

model fine tuning

By decomposing a complex task, it is easier to evaluate and identify errors in intermediate

reasoning steps

Cons:

CoT results can be sensitive to prompt variations and the nature of the task

Prompts need to be custom-crafted to specific language models and may not be transferable

to other models or tasks

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 24/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Increased latency and inference costs due to additional tokens generated to explain the

answer

Lengthy subtasks are more susceptible to small reasoning mistakes in the early steps

CoT may elicit verbose responses that make it hard to evaluate and verify more complex

tasks as a whole

Examples:

Google Brain research: Chain-of-Thought Prompting Elicits Reasoning in Large Language

Models (arXiv)

Chain of Thought Prompting (Microsoft Learn) using .NET

Pattern: Reflexion

The “reflexion” pattern uses the LLM to evaluate alignment and completion and is valuable for
15
ensuring agents work toward desired outcomes. This pattern builds on the ReAct pattern but

incorporates the use of long-term shared memory or global memory that is accumulated by

using an LLM to reflect on effectiveness of the plan and actions taken at supporting the agent’s

goal (see Figure 5). See memory scope and memory longevity patterns for more guidance. The

information from long-term memory is used as context during the plan generation for future

interactions with the agent.

Figure 5: Reflexion Pattern

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 25/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Some sources describe a basic interpretation of this pattern that uses the reflection or
16,17,18
introspection prompt to assess and refine the agent’s plan based on progress so far. This

helps ensure the agent stays “on task,” but does not provide any long-term performance

improvement.

Suitability: The “reflexion” pattern is suitable for use cases where learning and adapting

behavior over time are desirable (e.g., agents tasks with providing user support in complex and

broad environments). Optimal performance is expected to emerge over time as long-term

memory of successful or unsuccessful activity accumulates. Thus, you must either be able to

tolerate low initial performance or allocate time and resources during a prerelease or pilot

phase to “train” the agent.

Pros:

Self-learning through introspection improves agent performance over time

Builds on the ReAct pattern, so provides a path to improve performance of ReAct based

agents if their performance does not meet requirements

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 26/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Basic reflexion can be used for session level goal or task alignment and replanning

Cons:

Reflexion relies on memory implementations that will complicate platform infrastructure

requirements and operational requirements (e.g., you will need to plan for backup, recovery

and security of the memory stores)

Training requires interaction with the agent, and optimization of performance requires an

evaluation-driven approach to building up the agent’s memory

If long-term memory is allowed to change or grow over time, the behavior of the agent will

also change and additional monitoring should be implemented to ensure ongoing

performance

Examples:

Plan-and-Execute (LangGraph)

Introspective Agents: Performing Tasks With Reflection (LlamaIndex)

Understanding Goal Based Agents in AI (Medium)

Pattern: Structured Response

Structured response is a form of prompting that ensures that the response from the LLM (or the

API used to interact with the LLM) meets strict formatting requirements.

LLMs are generally very capable of generating text-based data structures in formats, such as

JSON. However, the nondeterministic token-predication nature of LLMs means that,

occasionally, the generated results are misformatted. For example, they can be:

Syntactically incorrect and have a broken structure with missing or incorrect formatting (e.g.,

missing brackets, collection delimiters or line termination).

Semantically incorrect and have missing or unexpected data format or content (e.g.,

unexpected or incorrectly named fields, fields with values of the wrong type or missing

required data).

Structured responses can be implemented by the LLM API provider, using frameworks like

Instructor, or through prompting techniques such as providing few-shot examples of the desired

format.

Suitability: Most suitable for generating output that will be processed directly by agent code or

persisted to a structured data repository.

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 27/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Pros:

Simplifies programmatic processing of LLM-generated output

Cons:

Limited support for data formats other than JSON

Examples:

Instructor open-source library

Structured Outputs (OpenAI)

Guidance to increase output consistency: Strengthen Guardrails (Anthropic)

Structured Outputs (LangChain)

Generate Structured Output With the Gemini API (Google AI for Developers)

Pattern: Retry Limits

Many of the patterns for implementing LLM-based agents rely on incremental, iterative or

recursive processing. When the processing of the agent is driven by LLM-generated plans and

tasks that are nondeterministic in nature, it is advisable to implement retry limits in your agent

code to prevent unacceptably long iteration on a task. The limit may be set to ensure a

reasonable user experience, and to help manage LLM and other processing costs.

Limits can be implemented in a variety of ways, for example by enforcing:

A maximum number of iterations or turns in a process

A maximum number of LLM calls per process

A maximum number of input and output tokens per process

A time limit on the processing

Suitability: Suitable in all situations, especially where user experience is dependent on getting a

response, whether positive or negative within a reasonable time frame and where cost of

inference must be controlled.

Pros:

Helps to improve user experience

Manages LLM provider costs, especially at scale

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 28/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Variety of approaches to implementation to suit the needs of your agent

Cons:

May lead to premature termination of agent processing

Increased retries or frustration for users if they do not receive useful results within the limits

defined

Examples:

Agents (CrewAI): — “Max Retry Limit,” “Max Iter” and “Max Execution time” parameters

LangGraph Glossary (LangGraph): recursion limit

AutoGen Chat Termination conditions

Agent Action Patterns

Return to top

This domain includes patterns that allow agents to take actions based on their interpretation of

inputs and their assessment of the tasks necessary to complete a task or meet a goal. These

actions include:

Sensing their system environment (e.g., querying data and context sources)

Deterministic decision and calculation logic (e.g., algorithm execution)

Manipulating their system environment (e.g., updating state or initiating transactional

processes via API)

The agent action patterns are:

Pattern: Function Calling

Pattern: Generated Code Execution

Pattern: API Tool Use

Pattern: Function Calling

The function calling pattern is widely supported by LLM API services, and is sometimes called

“tool use.” These APIs allow a developer to provide a list of functions that can be used by the

LLM to complete a request. The LLM does not call the tools directly. Instead, it provides an

intermediate response, which defines the functions to be called and the parameter values to be

passed to the functions.

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 29/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

The developer must implement logic to execute the requested functions, capture the results and

pass this information back to the LLM to complete or continue processing. The general flow of

the function calling process is shown in Figure 6.

Figure 6: Function Calling Flow

The functions available to the LLM are typically defined as “tools” in a JSON data structure that

includes the following information:

Function name: The programmatic name of the function (e.g., a python function of Java

method name).

Function description: The natural language description of the functions behavior or

capabilities. This should be precise and differentiated from other functions passed in the

same request to enable the LLM to differentiate their purpose.

Function signature: The name, data type and descriptions of each input parameter. The

required parameters can also be defined.

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 30/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Function calling is an incredibly flexible pattern as the functions can encapsulate any behavior

you choose, including access to external services and data stores. This code also executes in a

context controlled by the agent developers, allowing for control over security context and

identity management. This code can also be debugged and monitored like any other late-bound

function call in your preferred programming language (i.e., runtime rather than compile-time

binding).

The design of functions to be used in function calling should be:

Stateless: Completes processing based only on the inputs to the function and any persistent

data it can retrieve.

Atomic: Completes processing within the scope of a single-function call.

Synchronous: Provides the result expected by the LLM as the return value of the function

call. The function may spawn other asynchronous tasks or processes, but the results of these

must not be relevant to the LLM processing. For example “submit_order(id: 87254583)” may

return “true” to indicate the order was submitted, while order processing continues

asynchronously in the background.

Function calling is widely supported by LLM APIs, including both commercial and open models,

and is further supported by SDKs and LLM frameworks in various programming languages (see

examples below).

1 9
One emerging use of function calling is computer control, such as Anthropic Computer Use.

This approach uses a multimodal model that can interpret screenshots of a user interface and

use function calling to provide a set of instructions for the client process to interact with the

user interface (e.g., key inputs, mouse clicks and mouse movement). Anthropic also recently

announced its Model Context Protocol, which defines a standard interface for a service to
20
support function discovery and invocation.

Suitability: Suitable for encapsulating any deterministic program logic that needs to be made

available to the LLM-based steps of agent processing.

Pros:

Supports a developer-controlled interaction between the LLM and the environment.

Function behavior is deterministic, although inputs are generated nondeterministically by the

model and must be validated. Outputs are also interpreted nondeterministically by the LLM

and subject to the usual prompt engineering patterns, practices and constraints.

LLM API “tool definition” abstractions support tool calling in most programming languages.

Code execution takes place in the client process context and can follow existing identity,

security, tracing, error handling and debugging practices.

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 31/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Some LLMs and their APIs (including OpenAI GPT-4 Omni [GPT-4o] and Anthropic Claude 3)

can support parallel function calling. In this case, an LLM prompt provided with multiple tool

definitions can result in a request for multiple function calls that can take place in parallel.

Cons:

LLMs may only handle a limited set of tools in each prompt. For example, OpenAI’s Chat API
2 1
supports a maximum of 128 functions and recommends you use no more than 20
2 2
functions.

Maximum context length or token limits, must be handled by the function calling code. SDKs

that support this are available.

Tool selection is nondeterministic and highly dependent on the name and description of the

tools and functions. There is always a possibility that the LLM will select the wrong function,

or choose not to call a function at all and infer or hallucinate the data.

Examples:

Text Generation (OpenAI), Assistants API Overview (OpenAI), and the Batch API (OpenAI)

API (GitHub): Ollama tool support

Function Calling (Mistral AI)

Build With Claude (Anthropic)

Intro to Function Calling With the Gemini API (Google AI for Developers [Google Gemini])

Use a Tool to Complete an Amazon Bedrock Model Response (AWS Bedrock)

Workflow for a Function Calling Agent (LlamaIndex)

Tool Calling (LangChain)

Pattern: Generated Code Execution

The generated code execution pattern provides agents with the powerful, and potentially

dangerous, ability to generate arbitrary programmatic code that is executed by the orchestrating

process. The output or results of this code is typically passed back to the LLM as context for a

subsequent prompt.

Generated code execution includes the following scenarios:

Command evaluation: The generated code is a shell or command prompt comment that can

be executed to interact with local or remote (e.g., via curl HTTP requests) resources. For

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 32/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

example, the generated code may be a shell command to get more information, or to affect a

change.

Remote execution: The platform, LLM API or other managed service includes the code

interpretation and execution logic.

Out-of-process code execution: The generated code is compiled (if required) and packaged

or injected into a safe evaluation environment, such as an appropriately configured VM or

container.

In-process evaluation: For programming languages with dynamic code evaluation, the

generated code can be executed in the context of the client process that is invoking the LLM.

While a low-latency approach, this has significant security implications, as the code will have

access to all resources accessible to the core process and may affect the state or processing

of the agent orchestration.

Suitability: Generated code execution can be used to create highly adaptable and autonomous

LLM-based agents. However, unlike the function calling pattern, the dynamically generated code

presents a series of risks that must be evaluated and addressed. Agent requirements can be

met by operating in very constrained sandbox environments (e.g., a limited VM or container with

very constrained access to external resources such as APIs or data storage). When this occurs,

the generated code execution pattern can be used to create agents that can plan and execute

processing to a wide variety of unplanned user input.

The main challenge when using this pattern is the ability to test and validate the behavior and

performance of the agent, since much of its function will be dynamically generated by the LLM

at runtime, based on user input and context.

Pros:

Significantly expands the scope of capabilities for LLM-based agents, allowing them to

effectively create and execute new tools (e.g., to analyze data or automate tasks)

Generated code execution supports dynamic discovery and problem-solving behavior

The pattern is more flexible than function calling, which requires the developer to develop,

test and maintain each tool available to the agent

Cons:

Requires a secure environment configuration to mitigate risks of errors in generated code

and the attack surface for prompt injection or manipulation — either directly or via indirect

influence.

Functional, security and performance testing and validation is very challenging due to the

dynamic and potentially unpredictable nature of the generated code. For example, the

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 33/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

generated code may be syntactically or semantically incorrect and fails to execute or produce

invalid results. The agent must be able to identify and handle these errors using evaluation

patterns or human-in-the-loop.

Monitoring and observability of generated code is currently limited to techniques such as

monitoring input, outputs and interactions with the runtime environment. There is potential

for prompting the LLM to generate appropriately instrumented code, however, we have not

identified concrete examples of this approach (see interaction logging).

Examples:

Assistants Code Interpreter (OpenAI) (Beta)

Azure OpenAI Assistants Code Interpreter (Preview) (Microsoft Learn)

Build LangChain Agent With Code Interpreter (FoundryLabs E2B)

Open Interpreter (GitHub)

Code Executors (Microsoft AutoGen)

Pattern: API Tool Use

API tool use is a specialization of function calling (aka tool use) for invoking external services

via published APIs. Whereas function calling is specific to the LLM, client process executing

functions in local code. API tool use constrains the behavior of function calling to invoke

functionality for retrieving data or triggering transaction processing via remote APIs. APIs with

an interface contract (commonly defined in Open API Specification format) can be declared as

tools or actions for the model to use when appropriate.

Suitability: API tool use is suitable for implementing integration between agents and existing

services with defined API contracts. These services can include enterprise or application-

defined services, SaaS application or custom services defined specifically to support agent use

cases (e.g., a back-end-for-frontends [BFF] API design pattern could be users). Using APIs as

tools reduces the amount of function or tool-based code that is embedded with an agent

implementation. This also helps to decouple and ensure controlled access to extended

functionality (i.e., APIs access should be controlled by policies, typically enforced by an API

gateway, as they would be for any other API client call).

Pros:

API tool use simplifies the integration of AI agents with existing API-based services, including

data access and application functionality.

APIs should already enforce policy controls and security that protects them from abuse. This

reduces but does not eliminate the risk of LLM-generated tool invocation or misuse.

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 34/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Using APIs as tools (even when dedicated to the AI agent use case) provides an established,

open and portable decoupling mechanism.

APIs may encapsulate any other functionality, including other AI agents. This pattern can be

used to support multiagent modularity and agent-to-agent handoffs.

Cons:

Invoking APIs has higher latency and performance overheads than function calling, however,

functions may still need to make API calls to sense or act on external systems.

Direct invocation of APIs within an agent framework or platform limits the agent developer’s

ability to validate, augment or optimize the API request payload or response.

APIs must validate all their inputs and defend against common threats, such as SQL

injection.

API calls depend on well-formatted request payloads (e.g., conforming to JSON schema) and

hence are susceptible to LLM formatting errors. If implementing the API tool use pattern with

custom code, you should use structured response to mitigate this.

Examples:

Define OpenAPI Schemas for Your Agent’s Action Groups in Amazon Bedrock (AWS Bedrock)

API plugins for Microsoft 365 Copilot (Microsoft Learn [Microsoft 365])

Use Power Platform Connectors (Preview) in Copilot Studio (Microsoft Learn [Microsoft

Copilot Studio])

Agents (LlamaIndex)

Related Gartner Research:

Choosing an API Format: REST Using OAS, GraphQL, gRPC or AsyncAPI

Reference Architecture Brief: API Management

Decision Point for Selecting the Right API Mediation Technology

Agent Memory Patterns

Return to top

These patterns for defining and managing memory are crucial for agent performance and

include support for short-term memory (within a task), shared memory (across tasks per user)

and global memory (across all use of the agent). Proper memory design and management

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 35/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

enhances the agent’s ability to learn and adapt, but also ensures privacy and control of

information access. The agent memory patterns are:

Pattern: RAG

Pattern: Memory Longevity

Pattern: Memory Scope

Pattern: RAG

Retrieval-augmented generation (RAG) is a practical way to overcome the limitations of LLMs by

making enterprise data and information available for LLM-based agent processing. RAG

introduces an automated data retrieval step to inform LLM generation. The data stores used for

RAG are commonly updated by external data pipelines (e.g., updating a vector database or

search index as content is added to a repository). RAG practices are covered extensively in

Gartner research, including:

Getting Started With Retrieval-Augmented Generation

Reference Architecture Brief: Retrieval-Augmented Generation

Use These 8 Practices to Improve RAG Systems

Suitability:

The RAG pattern that can be applied to both short-term and long-term memory. It’s mostly used

for long term-memory, since it is suited to selecting relevant data from a larger corpus.

However, it can also be applied to short-term memory, for example to allow the use of snippets

of information as part of a process specific to user provided, or agent discovered, content in

files or hard-to-query data repositories.

RAG is most suitable for providing an agent with access to enterprise specific context that can

be injected into its prompts to the LLM. This enterprise specific context is commonly managed

by out-of-band data pipelines that ensure the repository has high-quality and timely data.

The core RAG pattern can also be applied to any of the variants of the memory scope pattern.

Pros:

Allows use of general purpose LLMs with context provided from enterprise or agent-specific

data stores.

Avoids the need for fine-tuning of LLMs and data can be updated on any suitable schedule.

Cons:

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 36/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

RAG increases inference time, as it introduces an information retrieval step and the retrieved

information increases the size of the prompt to the LLM.

Prompt engineering is still required to ensure that the retrieved information is used by the

LLM in the expected way.

Examples:

RAG Examples (GitHub)

MySQL RAG Search (CrewAI)

Agentic Strategies (LlamaIndex)

Pattern: Memory Longevity

Current language models are inherently stateless because their weights don’t change at

inference time. As a result, they don’t retain any memory between inferences. To retain

continuity of context, or memory, from call to call, an LLM relies on the supporting systems

(e.g., API wrapper, AI gateway, client code or agents system) to populate its context window (in

the form of prompt augmentation) to generate its response. Effectively, the agent system is

responsible for managing memory (including memory persistence, update and retrieval) to

sustain agent performance.

The memory longevity pattern is a spectrum of methods to manage memory manipulation and

retention period to support the requirements and constraints of the agent implementation.

Memory retention period can range from a single agent task or workflow to multiple concurrent

or sequential workflow executions. Memory longevity is independent of whether the memory is

scoped as local, shared user, shared agent or global memory.

Common memory longevity are:

Short-term memory (STM), aka working memory, is typically retained for the duration of a

task or single-agent workflow execution. It’s typically kept in RAM and does not survive

beyond the immediate agent runtime session. Short-term memory is best used for

maintaining relevant context between LLM call and tool usage as the agent multiple

intermediate steps in the workflow. Short-term memory also serves as working memory to

hold data from long-term memory for agent reasoning, LLM grounding or planning.

Long-term memory (LTM) represents information that survives beyond the lifetime of an

individual agent runtime instance. The agent system is responsible for identifying entities and

context that should be added to long-term memory. The memorized entities may be

structured to optimize for easy update and retrieval (e.g., using a vector database for

semantic-base retrieval or a graph database for storing knowledge represented as related

entities and concepts). Borrowing from psychology, the CoALA framework further subdivides

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 37/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

LTM into subcategories of procedural, episodic and semantic memory, as shown in Figure
2 3
7.

Figure 7: Types of Long-Term Agent Memory

The three subcategories of LTM are:

Procedural memory: Includes skills or implicit knowledge to execute a task. For humans, it’s

sometimes referred to as “muscle memory” and is used to drive a car or play basketball. In

the context of an AI agent, it may be represented implicitly as model weights that enable

reasoning or planning using Chain of Thought. It could also be implemented explicitly as a

collection of tools that LLMs can select and use via functional calling or generated code

execution.

Episodic memory: Includes experiences attained under distinct episodes that can be used to

guide future actions. Episodic memory is usually captured and later retrieved with associated

context, situation, location or time frame. With respect to an AI agent, it may represent

decisions or choices made by a specific user, user persona from prior interaction sessions

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 38/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

(e.g., making a travel itinerary, or requesting a product return) to augment decision preference

for new interactions of similar circumstances.

Semantic memory: Represents the world knowledge available to the agent with broad

applicability, as opposed to the more narrowly scoped episodic memory. For AI agents,

semantic memory may include general or domain-specific facts, information about the

(physical or digital) environment the agents operate in, and conceptual knowledge about

people, objects or events along with their respective characteristics and property. Such

knowledge can be described as discrete concepts, or in the forms relationship and

ontologies. AI agents commonly access semantic memory as ready-only memory for

grounding purposes (as in a RAG use case). AI agents may also write to semantic memory as

part of an in-line agentic transaction (e.g., updating an user profile), or as the result of an out-

of-band analysis of interaction logs to record decision preferences.

Suitability:

Short-term or working memory is required in any agent implementation to maintain runtime

context and states. It also facilitates communication of relevant data between LLM

invocations and tool use. It’s also critical to support agent process patterns such as agent-to-

agent handoff and agent-to-human handoff.

Short-term memory is well-suited for protecting any private or sensitive user information

during agent execution, because the working memory is not persisted or retained beyond

agent runtime.

Long-term memory is useful for retaining information, if permitted or authorized by users, to

improve performance, accuracy and explainability of the agent. For example, user-in-the-loop

feedback may be used to improve alignment with human preference by retrieving

personalization parameters from long-term memory.

The memory longevity considerations should be applied to implementations of the memory

scope pattern. You must ensure that memory data retained and shared between users or

agents does not create any data leakage or compliance issues.

Pros:

Short-term memory keeps track of runtime context and intermediate states of agent

workflow execution.

Long-term (semantic) memory helps ground the agent and LLMs with relevant and up-to-date

knowledge.

Long-term episodic memory can improve alignment and user experience by enabling

customization and personalization for use case-specific tasks.

Cons:
https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 39/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Tools and frameworks available to support agent runtime learning (i.e., dynamic update to

long-term memory) are currently limited.

Each long-term memory implementation requires some level of tuning, optimization and

customization.

Keeping long-term memory up-to-date and accurate requires external DataOps workflow.

Examples:

Letta agent framework’s memory: based on Letta (MemGPT) research — MemGPT: Towards

LLMs as Operating Systems (arXiv).

Zep provides a knowledge-graph based long-term memory framework for AI agents.

Mem0 offers a hybrid long-term memory framework that utilizes vector, key-value pairs and

graph databases.

A Long-Term Memory Agent (LangChain)

Memory (CrewAI) provides a comprehensive set of memory management components as

part of its agent framework.

Related Gartner Research:

Vector Databases Have Value Beyond Large Language Model Integration

Decision Point for Selecting the Right DBMS

Pattern: Memory Scope

Agents commonly run in the context of a chat interface. In the simplest use cases, each user

and their interactions can be viewed as isolated from each other — so the interactions of one

user cannot affect others. However, in some use cases, it makes sense for the memory

accumulated by an agent to be shared across related users (e.g., those associated with a

specific customer or account), multiple agents or the entire system.

The memory scope pattern is a way of assessing the design and implementation of a memory

repository to best fit the needs and constraints of the agent implementation. The scope of the

memory is independent of whether it is used for short-term or long-term memory. The common

scopes of a memory store are:

Local scope: Memory data access is limited to the specific agent, and user or client identity

so interactions from one source do not affect the behavior of the agent for other identities.

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 40/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Shared user scope: Memory data access is shared across all users of a single agent, such

that accumulated information and learned behavior is available to all users.

Shared agent scope: Memory data access is shared across multiple agents, but is still

limited to the interactions of a single user or client identity. Accumulated memory is used as

shared context across different agent roles.

Global scope: Memory data is shared across all users and all agent roles.

Suitability: The memory scope pattern should be applied to all variants of agent memory

longevity implementations to ensure the appropriate scoping is implemented. You must ensure

that the sharing of memory between users or agents does not create any data leakage or

compliance issues.

Pros:

Provides an approach for evaluating the level of sharing of memory data between users and

agents.

Memory scope helps ensure only relevant and authorized information is available to the

relevant agents. Limiting memory scope can help prevent different agents in a multiagent

getting confused by too much irrelevant context.

Cons:

Does not define the implementation patterns required to deliver each type of scoping.

Mechanisms to manage and enforce fine-grained access control and isolation between

memory scope can be challenging to implement and audit — particularly when sensitive data

is stored within a certain memory scope.

Examples:

The Anatomy of Agentic AI (International Institute for Analytics [IIA])

AI Agents in Azure Cosmos DB (Microsoft Azure Cosmos DB)

Shared Memory Between Agents (Griptape)

Agent Evaluation Patterns

Return to top

These patterns can be used to implement continuous evaluation, validation and improvement

for maintaining high-quality agent performance. The agent evaluation patterns are:

Pattern: User-in-the-Loop (User Feedback)

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 41/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Pattern: LLM-as-a-Judge

Pattern: Deterministic (code-based) Evaluation

Pattern: Interaction Logging

Pattern: User-in-the-Loop (User Feedback)

User-in-the-loop (UITL) pattern describes a workflow that requires users to be looped into any

stage of the AI system development pipeline — from concept design and initial training
2 4
conditions through to live and even in-live training. The users’ input can range from simple

mechanics such as like/share, rate or thumbs up and richer feedback (e.g., correcting output

and curation of training data) to more detailed manipulation of configurable parameters.

The feedback provided by the user-in-the-loop may be integrated with agent memory patterns to

provide context that affects the behavior of a deployed agent. It can also be collected and used

in future iterations of an agent design to support refinement of prompts, model parameters and

agent processing. Having UITL solutions integrated into the agent ensures sustained

effectiveness and helps in responsible use of AI.

We distinguish UITL from the human-in-the-loop (HITL) pattern. UITL is focused on gathering

user feedback to improve subsequent agent performance. HITL is a functional pattern for

interacting with a human within a process; it is not a pattern for evaluating or improving the

performance of an AI agent. Note that some resources use these terms interchangeably.

Suitability: The UITL pattern is widely applicable to agent use cases.

Pros:

Having UITL as part of an agent system guides sustained effectiveness and helps ensure

agent evaluation includes feedback from actual users in production environments.

Feedback collection can be selected to optimize the user experience balancing the quality,

richness and volume of feedback collected.

UITL data can be used both to identify opportunities for improvement and to validate the

result of changes to the agent system.

Cons:

Requires extension or adaptation of the user interface and human workflow to include

feedback steps. This can put a burden on human users, and any overload or fatigue with the

process is likely to reduce the quality of feedback.

Interpreting the feedback data to determine how to improve agent performance is not well-

defined in most cases, particularly in the dynamic plan generation and generated code

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 42/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

execution patterns.

Runtime integration of feedback using agent memory patterns is an advanced scenario

which you must carefully evaluate. For example, you must ensure the feedback data does not

degrade performance and cannot be used to manipulate agent behavior in undesirable ways.

Examples:

Confident.ai DeepEval Human-in-the-Loop for LLMs in Production

LLM Feedback Loop (Nebuly)

st.feedback (Snowflake Streamlit)

How to Coach AI Agent and Give Feedback (Gorgias)

Add Feedback for Every Response (Microsoft Learn [Microsoft Copilot Studio])

Human Annotation (Langfuse)

Related Gartner Research:

Emerging Tech Impact Radar: Human-Machine Interaction

Pattern: LLM-as-a-Judge

LLM-as-a-judge (LLMaaJ) is an evaluation pattern that uses an LLM to evaluate agent

intermediate or final output based on criteria and metrics expressed in the prompt to the
2 5
LLM. LLMaaJ typically utilizes three evaluation methods:

Direct scoring with reference: The LLM judge grades the candidate agent response by

comparing to some “ground truth” data in the form of a known answer or reference

document. It produces a numeric score or grade along with an optional rationale supporting

the score.

Direct scoring without reference: When no reference is available, the LLM judge is given a

rubric as criteria (in the prompt) to grade evaluate the agent’s response.

Pairwise comparison: The LLM judge is given two candidate agent responses to select the

“better” answer in accordance with the criteria.

Suitability: LLMaaJ can perform online or offline evaluations of LLM/agent intermediate steps,

in various metrics such as bias, helpfulness, sentiment, correctness and hallucination. In an

online (in-band) scenario, LLMaaJ is similar to the use of LLM to critique or assess the quality

of intermediate output in the reflexion pattern. In offline scenarios, LLMaaJ can consume more

time and computational resources to perform a more thorough evaluation of interaction logs to

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 43/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

assess, for instance, the overall effectiveness or helpfulness of a LLM-based custom service

agent.

Pros:

Reduces the need for human involvement and intervention.

When compared to human judges, this pattern enables scalability, faster iterations and

reduces runtime latency.

Can be more effective than developing complex business rules, particularly for natural

language or other unstructured or semistructured output.

May be used with a user-in-the-loop pattern to improve alignment of the LLM judge with

human preference.

Cons:

An LLM judge is susceptible to mistakes in its judgment because of its stochastic nature.

Not suitable in low-latency use cases where the agent needs to render a response in a timely

manner.

The alignment of an LLM judge can be sensitive to the underlying LLM model, making it

challenging and labor-intensive to realign a judge when a new LLM model is introduced.

Challenging to detect and mitigate the inherent bias of the LLM judges.

Hard to align with human preferences, particularly when it involves domain-specific tasks —

mitigating this can require a “calibration” period where fine-tuning or prompt engineering

takes place to align results to evaluations to one or more human judges.

The LLM inference cost of LLM judges can be hard to control, and the judging process may

be a potential point of information leakage.

Examples:

Evaluation Concepts (LangSmith)

Using LLM-as-a-Judge for Automated and Versatile Evaluation (Hugging Face)

Built-In AI Judges (Databricks)

LLM-as-a-Judge (Langfuse)

Pattern: Deterministic Evaluation

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 44/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Deterministic evaluation uses established, nonstochastic verification methods to evaluate agent

output to produce a definitive answer, as opposed to the stochastic LLM-as-a-judge evaluation.

Deterministic evaluation is typically implemented using code or a rule engine to measure the

validity, quality or accuracy of the agent output. For example, it may take the form of a regular

expression check to validate data type, a linter to detect syntax error and perform static analysis

on agent generated code. It may also be a call to an existing rule engine to validate

conformance to policies.

Suitability: Ideal for evaluation of agent capabilities that involve direct calculation of metrics or

validation of rules or policies (e.g., to verify compliance with customer refund policy). This

pattern is most suitable for validation of structured information and data. For example, use

deterministic evaluation to verify a travel plan is within budget, date ranges and includes valid

locations. Deterministic evaluation also pairs well with structured response pattern where the

LLM or API output can be quickly checked against syntactic or semantic errors. Use

deterministic evaluation when the validation rules are simple to define, implement and maintain.

Pros:

Leverages tried-and-true existing evaluation or validation code

Good for syntax or schema validation

Offers deterministic answers (yes/no, pass/fail, numerical result)

Helps mitigate cumulative error from multiple stochastic systems working together

Provides fast and scalable execution

Cons:

Formal and accurate deterministic evaluation may not be easy or practical to implement,

particularly for systems that involve many complex rules

The rules and metrics selected for deterministic evaluation need to be defned, validated and

managed.

Deterministic evaluation often relies on heuristic rules or selected indicator metrics that may

inherit hidden human bias from the developers or be insufficient to handle a diversity-relevant

responses produced by LLMs.

Examples:

JSON schema validation

Programming language linters such as JSLint for Javascript and Pylint for Python

Pydantic data validation framework

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 45/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Pattern: Interaction Logging

Interaction logging collects and aggregates all the relevant interactions so agent processing

can be analyzed to support diagnostics, optimization, analysis and audit. This will include these

interaction types:

User-to-agent or client-to-agent

Agent-to-user or agent-to-client

Agent-to-tool

Agent-to-LLM

LLM-to-agent

Agent-to-agent

This pattern applies distributed tracing patterns to agentic systems. Each process instantiation

(e.g., a conversation initiated by a user) should be given a unique ID (a “trace ID” in distributed

tracing terminology). The process ID must be propagated between each component of the

agent so any logs generated can include it. Within a process instance, each interaction should

be given a unique interaction ID (a “span ID” in distributed tracing terminology).

The logging interactions must include both the process ID and the interaction ID along with

other relevant metadata such as timestamps interaction type, interaction response time,

interaction response or error codes, and interaction inputs or outputs.

Suitability: Interaction logging is an essential pattern for all LLM-based agents. It provides

developers with insights into agent and user behavior during development, supports evaluation

of agent performance over time, and can support audit or reporting requirements.

Pros:

Enables debugging and diagnosis of agent behavior in development

Supports tracking of user behavior and interactions across distributed components of agent

systems that can be used for agent evaluation, monitoring and analytics

Supports measurement against business SLAs, process SLAs and other operational metrics

Cons:

Interaction data may contain sensitive information that must be redacted or tokenized

Even with redactions or tokenization production interaction log data can be an attack surface

and data leakage risk, and must be appropriately secured

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 46/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Sensitive log data may prevent or limit some use cases for the data

Many interactions will be unstructured or semistructured natural language prompts and

responses that require further use of AI and non-AI techniques to analyze at scale

Examples:

Evaluate. Optimize. Ship With Confidence. (Humanloop)

AWS Bedrock Prompt Management (AWS Bedrock)

Langfuse tracing

Galileo Evaluate and Observe

What Is Mosaic AI Agent Evaluation? (Databricks)

OpenLLMetry (Traceloop)

Patronas AI LLM observability

Security and Identity Patterns

Return to top

This domain focuses on patterns for ensuring security and identity management in agent

systems. This includes LLM guardrails and identity propagation to prevent misbehavior and

ensure authorized operations. The security and identity patterns are:

Pattern: Guardrails

Pattern: Identity Propagation

Pattern: Guardrails

Guardrails are validation and verification rules that are applied to the prompts and responses

within an agent system to ensure that they contain only information within the scope or

behavior of the solution. Many AI solutions already embed basic guardrails to prevent users

from submitting unsafe or inappropriate requests and to them from receiving inappropriate

responses. However, in LLM-based systems, including agents, there are typically solution

specific requirements to control LLM interactions to ensure safe and predictable behavior.

Figure 8 shows the basic flow of guardrails applied to LLM inputs and outputs.

Figure 8: AI Guardrails Flow

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 47/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

There are various types of guardrails rules or policies that can be applied to inputs or outputs

and often used in combination. These include:

Regular Expression (RegEx) guardrails: Used to identify well-defined string formats within

the input or output — for example account numbers, emails, dates, addresses or financial

data.

LLM-based guardrails: Used to process natural language to identify unsupported or

undesirable requests or content not easily matched using RegEx or other structured rules.

Examples include allowing or denying prompts that relate to a specific topic, or improving the

robustness of PII detection and removal.

Structured data guardrails: Ensure that input or output data fits an expected data format

(e.g., validating that JSON or XML formatted data is valid and fits an expected schema

definition).

Code validation guardrails: Ensure that code (typically generated code) is of the required or

supported language and is syntactically correct.

Prompt expansion guardrails: Ensure that every LLM prompt includes required system

prompt content, which may add agent-level or organization-level behavior, tone or response

expectations.

When implementing guardrails, pay close attention to the user experience and error handling

when they are triggered.

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 48/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Suitability: The guardrails pattern is widely applicable to GenAI applications and can be applied

to LLM-based agents to improve predictability and protect against common attacks and errors.

Some guardrails policy types depend on LLMs to evaluate content and cannot completely

eliminate risk. Due to the overheads that guardrails introduce, you must evaluate the risk to your

users or organization and assess the impact of guardrails on the agent response time, behavior

and cost of the solution.

Pros:

The guardrails pattern can be implemented in your agent code using a combination of

framework or toolkit, or using an AI gateway proxy to apply guardrail policies to LLM API

requests

Guardrail solutions provide modules or plugins for a variety of guardrail types, and can often

be extended with custom policies where required

Guardrails can be applied to any LLM prompt in an agent implementation, whether from the

user or generated within the agent implementation

Cons:

Guardrails solutions and frameworks (OSS and commercial) are nascent and sometimes

beta

For LLM-based evaluations, a different LLM is used under the hood, which introduces an

additional processing dependency, as well as increased inference latency and cost

Guardrails are primarily focused on general LLM interactions and lack specific policy

definitions (e.g., verifying generated plans or agent-to-agent handoffs that meet system

expectations)

Examples:

NVIDIA NeMo Guardrails (NVIDIA) toolkit (beta)

Guardrails AI framework and server

Stop Harmful Content in Models Using Amazon Bedrock Guardrails (AWS Bedrock)

Guardrails (Portkey AI)

AI Governance System (ZeroTrusted.ai)

Kong AI Gateway

Related Gartner Research:

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 49/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

GTP Client Webinar: How to Use Guardrails to Improve LLM Outcomes

Use Model Guardrails to Regulate Generative AI Output and Behavior

Pattern: Identity Propagation

Identity propagation is an identity and access management (IAM) pattern adapted to agentic AI

use cases. It preserves and maintains the user’s identity and entitlements as the agentic system

executes its tasks on behalf of the user. As an agentic system executes its workflow, it may

utilize multiple (local or remote) LLM inference sessions, call upon internal or external

applications or services via function calling or API tool use. This can also access public, private

or enterprise databases and knowledge base as part of a RAG implementation.

At every step, you must ensure that both users and agents are properly authenticated and

authorized to execute the task at hand. For instance, when an agent makes an API call, it needs

the right API credentials and entitlements delegated by the user or client process. When the

RAG system accesses the knowledge base, it can only retrieve information authorized to the

user the agent represents. This is typically accomplished by propagation of the end-user or

client security context, along with machine identities, at every step of the agent execution flow,

instead of handing over the full user credential to agents.

The critical elements to implement identity propagation for AI agents are:

Human identity, a digital identity assigned to a human user

Machine identity, a digital identity assigned to an AI agent to facilitate agent-to-agent

communication, agent-to-API or database interaction

Human and machine identities, authenticated and authorized based on identity tokens that

can be propagated between components of the agent system (e.g., JSON Web Token [JWT]

can be passed between agents, subagents, API tools and data stores used for RAG)

Fine-grained authorization, implemented at the component level, and uses the propagated

identity tokens to identify the security principle (e.g., human or agent) and its entitlements

(e.g., using claims-based or role-based authorization)

Suitability: Identity propagation is applicable to all agent implementations that access or

interact with any resource that requires identity-based access control.

Pros:

Can use existing identity management techniques and infrastructure widely used for web and

API access control

Ensures agent actions are attributable and limited to the authorization level of the user or

client process for a given agent interaction

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 50/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Cons:

Complex agent implementations may require multiple identity tokens and complex token

mapping to support authorization across disparate resources, such as APIs and databases.

Identity tokens are sensitive information and need to be protected from leakage and misuse

(e.g., ensuring they are handled appropriately in any caching, data storage or logging).

Long-running agent processes may require token life span (time-to-live) beyond what is

normally supported in an environment, requiring token refresh or reauthentication.

When the identity token used to authenticate a user to the AI agent itself is not sufficient to

access a third-party API used in tool calling, an OpenID Connect or similar user

authentication flow is required.

Where nonrepudiation is required, the agent orchestration logic may need to log the user and

agent identity associated with any actions taken. The target system may capture only a

single identity (e.g., the user or agent).

Examples:

Okta Auth0.ai (unreleased)

Using API identity with CrewAI: CrewAI and Criteo API — Part 1 (Medium)

Recommendations
The hype surrounding LLM-based AI agents means that software architects are very likely to

see demand from business and technology stakeholders to apply them to business

requirements, or to explore their feasibility. By learning the emerging patterns captured in this

research, you can better understand the capabilities that can be supported by agents and

identify the opportunities and complexities of delivering them.

Begin by experimenting with a few patterns at a time in your development environments. Use

the open-source tools and frameworks (many of which are referenced in each pattern) to help

explore these patterns and understand their purpose. Start with the functional patterns that

are used to implement agent capabilities and behaviors before moving on to the operation

patterns needed for robust production-ready AI agents.

The inability to validate the behavior of AI-based solutions is the most common blocker to

deployment. To address this, when you have a verified production scenario for AI agents,

build agent evaluation capabilities and gather the real-world data to ground them as an

integral part of your LLM-based agent implementation. This is critical to ensure that you can

establish production-grade trust in its behavior.

Modularity is critical in AI agent delivery, much like traditional software. Whether the modules

are software components or to be deployed as distributed services, modularity will help you

unit test and monitor their behavior, simplifying fault-finding, optimization and change

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 51/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

management. Use the “agent architecture” and “agent action” patterns (as shown in Figure 3)

to provide clear modularity and flexibility within your solutions. Modular evaluation and

testing of agent components is an important foundation for maintaining overall agent

performance and trust.

The core components of LLM-based actions are software defined logic, structured LLM

prompts, API-based interactions, datastores and tools/services. This means you can

implement AI agent patterns using existing development tools, and automation, orchestration

or integration platforms to build them. Do this to leverage existing investments, skills and

assets, where possible.

Conclusion
LLM-based AI agents represent a new class of software that, while it has huge potential, also

introduces a new set of challenges to architects, designers and developers. As organizations

explore the potential of agents, experimenting and piloting them, patterns and practices are

constantly emerging. Be mindful that not every problem is best solved by an AI agent as you

explore their potential.

In this research, we have described some of the most common, well-established and important

patterns we think are necessary for software and AI engineers to understand when presented

with the opportunity, and challenge of building LLM-based AI agents. The patterns presented

are not the “endgame” and will continue to evolve, while others will emerge. These patterns have

also been selected to be composable, and less agentic assistants and chatbots can still benefit

from these patterns.

We encourage you to be pragmatic in your approach to exploring the potential of AI agents for

your organization. Learn the characteristics of an AI agent (see Figure 2) and these patterns,

and apply them only to business problems or requirements where they are clearly the best

choice.

Evidence
1
Innovation Insight: AI Agents

2
Reinforcement Learning Agents, Springer.

3
Dreamforce 2024: Key Announcements and a New AI Era With Agentforce, Salesforce.

4
Introducing New Agents in Microsoft 365, Microsoft.

5
Customers Are Putting Gemini to Work, Google Cloud.

6
ServiceNow to Unlock 24/7 Productivity at Massive Scale With AI Agents for IT, Customer

Service, Procurement, HR, Software Development, and More, ServiceNow.

7
Function Calling, OpenAI.

8
Multi-Hop Question Answering, arXiv.

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 52/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents
9
Build With Claude, Anthropic.

10
LLMs Still Can’t Plan; Can LRMs? A Preliminary Evaluation of OpenAI’s o1 on PlanBench,

arXiv.

11
ReAct: Synergizing Reasoning and Acting in Language Models, arXiv.

12
Learning to Reason With LLMs, OpenAI.

13
R1-Lite-Preview Is Now Live: Unleashing Supercharged Reasoning Power!, DeepSeek.

14
QwQ: Reflect Deeply on the Boundaries of the Unknown, Qwen.

15
Reflexion: Language Agents with Verbal Reinforcement Learning, arXiv.

16
Reflexion, LangGraph.

17
Reflection Agents With LangGraph | Agentic LLM Based Applications, Medium.

18
A Guide to Reflection Agents Using LlamaIndex, Analytics Vidhya.

19
Computer Use (Beta), Anthropic.

20
Introducing the Model Context Protocol, Anthropic.

21
Create Chat Completion, OpenAI.

22
Function Calling, OpenAI.

23
Cognitive Architectures for Language Agents, arXiv.

24
User-in-the-Loop Evaluation of Multimodal LLMs for Activity Assistance, arXiv.

25
Judging LLM-as-a-Judge With MT-Bench and Chatbot Arena, arXiv.

Note 1: The Key Challenges of LLM-Based AI Agents

The following list are some of the key challenges that software architects and engineers

delivering LLM-based AI agents will need to address to deliver successfully:

AI engineering skills: The expertise needed to implement AI agents into production use

cases spans can involve many technical and nontechnical disciplines, including:

Software engineering

Product management

Automation and integration

API design and management

AI and data science

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 53/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Quality assurance and testing

Governance, risk and compliance

Legal

LLM limitations: While powerful, general purpose LLMs have well-documented limitations

that impact on the architecture and capabilities of LLM-based agents. These include

inference performance, context window length, hallucinations, nondeterministic output and

forgetting-the-middle (or large context data).

Process “design,” refinement and constraint: At their core, AI agents are automation tools —

solving problems or completing tasks that might otherwise have been done manually, or not

done at all. Agents are not the only solution and the software architect must not only ensure

that an AI agent is suitable, but is implemented such that it “does the right things, the best

way.”

Error-detection and handling: Many of the technical interactions between components in an

LLM-based agent involve unstructured or semistructured data. This makes detecting and

handling errors (or omission) in these interactions more challenging than in typical structured

data. Software architects may need to rely on statistical methods, natural language

processing, deterministic evaluation, or further LLM prompts to identify and handle these

errors.

Security and identity: Robust authentication and authorization for access to external

resources and resources is critical — AI agents must not rely on LLM-generated information

to protect access to information. Beyond the patterns defined in this research, software

architects should also take measures to prevent LLM or agent “jailbreaking” and prompt

injection/manipulation to make the tool diverge from its defined purpose and guardrails.

Trust through evaluation: The nondeterministic nature of AI and the flexibility of natural

language inputs and outputs means that a significant part of building trust in the agent

solution is driven by objective evaluation, validation, explainability and interpretability of its

functional capabilities. This means that the platform and tooling you select must include the

ability to support this.

Model/prompt coupling: LLMs vary widely in their behavior, between providers, model types,

model sizes and model versions. This means the response for a given prompt varies between

models and model versions and it is easy for your agent solution to become coupled to a

specific model type, size and version. This is because the effort to evaluate and build trust in

a different model is too great. Prepare for this by automating as much of this evaluation as

possible. Where possible, routinely test your agent with different models to understand their

impact on its behavior and prepare for the moment when you have to change or upgrade.

Model inference cost: LLM pricing is typically token-based or resource-based. In simple chat

scenarios, the tokens that represent the user’s input and the model response are mostly

visible and transparent to the user. In agentic systems, the LLM is used extensively for

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 54/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

intermediate steps, and using patterns that can iterate or loop, using dynamically retrieved

information as context. All of this will consume significantly more (perhaps orders of

magnitude more) than the simple LLM request response chat scenario. These costs apply

both to the development and evaluation processes and the production deployment, and

estimating them can be challenging. Ensure that you monitor your agent’s token consumption

or resource requirements throughout development and evaluation. Use this information to

assess the cost of anticipated usage patterns in production.

Model inference latency and throughput: LLM inference is computationally intensive and

relatively slow when compared to traditional computation and calculation. When LLMs are

used in “human time,” as in an interactive chat directly with an LLM, this may be tolerable.

However, in an agent scenario, the user may provide a simple prompt. An agent that is using

multiple LLM prompts for dynamic plan generation, reflexion, chain of thought, function

calling and agent handoffs is likely to take considerably longer to complete a process than a

traditional automation or workflow automation engine. Your user experience design needs to

take this into account. Additionally, if your agent is intended to provide back-end processing

of incoming event notification (e.g., consider the interplay of event frequency and average

processing time). Establish the appropriate queue or topic-based buffering or messages,

along with a resource capacity plan to cope with peak demand (e.g.,, determine how many

parallel instances of the agent are required to process peak demand within the maximum

time allowed).

Recommended by Authors

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 55/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Reference Architecture Introduce AI Observability to

Brief: Retrieval-Augmented Supervise Generative AI

Generation RESEARCH · 5 September 2023

RESEARCH · 8 October 2024

Getting Started With

Generative AI in Your

Application Architecture
RESEARCH · 9 August 2023

Your Peers Also Viewed

https://www.gartner.com/document/6142159?ref=hp-top-trending&reqid=0ffa297b-0afa-48a9-a756-0557d53d587e 56/59
3/29/25, 9:58 AM Emerging Patterns for Building LLM-Based AI Agents

Key Skills to Build LLMs Best Practices for Building

With Retrieval-Augmented Successful AI Solutions

Generation RESEARCH · 29 April 2024

RESEARCH · 16 April 2024

Explore Small Language A Journey Guide to

Models for Specific AI Generative AI Adoption for

Scenarios Technical Professionals

RESEARCH · 28 August 2024 RESEARCH · 28 February 2025

Since taking the world by

storm in 2023, generative AI

has increasingly been used

in organizations...

Integrating GenAI Into Your

Application Architecture
RESEARCH · 5 March 2025

Recommended Multimedia

VIDEO