
Understanding LLM Agents: Concepts, Patterns & Frameworks
Author(s): Allohvk
Originally published on Towards AI.
β A little piece of advice. You see an agent, you do what we do. Run! β β Cypher to Neo in the movie, βThe Matrixβ
This is my exact feeling also when I see yet another presentation of an AI agent based approach. Every AI solution nowadays is wrapped with an βagenticβ prefix. This article demystifies what agents are, when they can be used & when they should not be used. We also cover key agentic patterns.
In philosophy, the word Agency means the ability of an entity to act or take decisions in a given environment. An agent refers to the entity that is doing this βactingβ. In other words, an agent is an intelligent entity (possessing desires/intentions/beliefs) that acts in a certain manner based on its interactions with the environment.
Key Agent components
When it comes to LLM-based agents, there is no universally accepted definition, but we can extend the philosophical definition to say that an agent is an intelligent(?) entity that leverages LLMs to solve complex tasks by interacting with the environment via a set of tools. Since LLMs possess good understanding, reasoning & generation capabilities, it makes sense for agents to utilize them as a backbone for solving complex tasks. Interestingly, note that the definition above does not involve external (like human) assistance or intervention at any step. So agents have complete power to make choices that take them towards the goal for which they are created! Imagine, if they went rouge like agent Smith in the Matrix quadrilogy.

So we could define LLM Agents as autonomous entities with complex reasoning capabilities needed to break-down user-tasks into a sequence of optimal operations and the ability to use tools to execute these operations. LeCunn suggests that an agent should have:
- Perception to sense the environment & estimate the current status
- World Model β systems that form internal representations of the physical environment. It estimates missing information not provided by perception & predicts plausible future states
- Cost module to evaluate actions
- Actor Module to propose action plans & identifying optimal sequences
- Configurator to preconfigure all the above modules for the task at hand
- Memory to store historical & predicted environment states
Ok, this is getting a bit vague. Let us try another angle. Throughout the history of computing, programs were largely pre-determined workflows. Complexity was managed by adding if/else statements. If this situation happens, do ABC else do XYZ etc. If something totally unexpected happens β well, there are βexceptionsβ to handle such scenarios. But most real-life tasks may not fit into such pre-determined workflows. Force-fitting them could make programs ugly & complex. The alternative is an agent. The agent decides the best path to the goal dynamically. We just have to empower it with the right tools and trust it to take the right decisions!
Agent Exampleβ Book me a flight ticket
Let us say we want to book tickets on the cheapest airline from source X to destination Y as long as there is not more than one hop. We could write a pre-determined workflow for this⦠like an automation program. This works rather well when the workflow is straight forward and there are no surprises involved. But the automated program fails if (say) the ticketing website is down unless these kind of errors are anticipated & explicitly handled.
The agent, on the other hand, is far more flexible. Based on the objective & the task at hand, it will evaluate multiple possible options at its disposal like trying an alternative ticketing website or re-trying after a while or even contacting the ticketing company via voice call β exactly like how a human secretary would utilize all possible options at their disposal in order to book that darned ticket for their demanding boss! To achieve this, the agent needs to:
- use worldly knowledge to understand the intent (book flight tickets) & the entities involved (date, source, destination, single hop/ no hop ticket ) for this task. If the task is complex, it needs to break it into smaller composable operations & create an action plan. A good LLM can do this job.
- It needs to find out the best site for booking tickets since the user has not specified this. It may need access to a Google search engine or an API which will be one of the tools at its disposal
- It needs to understand the UI of the ticket booking screen and interact with it. For this, it needs access to a browser automation tool like playwright which can execute actions on the browser
- In order to generate the above playwright scripts, it needs an LLM which is pre-trained on Playwright API. Maybe it needs access to a compiler tool to test whether the scripts it generated are ok
- It needs to know how to interpret the results of the flight search. It needs to specifically look for the cheapest direct or one-hop flights while ignoring cheaper multi-hop flights. A human could easily make out where the rates are located on the screen. The agent needs to know where to look for the flight prices. It has to rely on heuristics like the prices are usually located to the right of a βPriceβ label or maybe under it and that it is followed by the currency symbol. An LLM pre-trained on web screenshots or one that is an expert in HTML may be ideal at perceiving these things. Maybe this could be a specialized LLM that is different from the one that did the task decomposition (step 1)
- Maybe the agent needs access to a short term memory module (a scratch pad to store ticket rates & other items) in case it decides to access dozens of ticketing websites to get the best deal
- All the while, keep a track of the running cost (LLM tokens it is consuming, cost of API calls, bandwidth aspects etc).
In fact, there is a whole dataset of such web browsing tasks to benchmark agents. Earlier execution was sand-boxed to toy environments. Modern benchmarks involve letting agents loose on the real internet. We are entering an era of non-deterministic programs which use the Agentic paradigm to handle real-world tasks! Wow. This does bring up the issue of β (a) Evaluation β Evaluating LLM generated text output was tough enough. Evaluating agents is going to be even tougher (b) How do we control agents without letting them run bersek in their quest to achieve their goals. I address the former towards the end of this article & the latter maybe in a subsequent one.
There are also Multi-Agent Systems (MAS) which comprise of (surprise) multiple agents which collaborate with each other to achieve end-goals. These provide scalability, better reliability & robustness. There are also a whole bunch of frameworks which enable agentic based solutions with minimal lines of code. We will dwell a bit deeper into their architectures later but here is a teaser:
llm = ChatOpenAI(model_name="gpt-4")
tools = load_tools(["google-search", "llm-math", "wikipedia"], llm=llm)
memory = ConversationBufferMemory(memory_key="chat_history")
agent_test = initialize_agent(
tools, llm,
agent="zero-shot-react-description",
memory=memory)
Agentβ Origins
The first generation agents were mainly ReAct (Reason, Act) agents. ReAct combined 2 well known techniques which were typically used in isolation. The first technique was harnessing LLM reasoning skills using standard approaches like chain-of-thought prompting. The second technique was to use LLMβs acting skills β basically its ability to plan and act in an interactive environment. ReAct combined these 2 skills very effectively to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two. They neatly demonstrate that the Reason+Act paradigm clearly outperforms reasoning & acting only paradigms.

Basically the idea is to enhance the agentβs original action space to include thoughts or reasoning traces. Each such thought aims to compose useful information by reasoning over the current context and create a new updated context to support future reasoning or acting. What exactly are these thoughts? Well, these could be thoughts revolving around decomposing task goals to create action plans, thoughts that involve injecting common-sense knowledge relevant to task solving, thoughts involving extracting important parts from observations and so onβ¦
The authors give a few In-Context examples to show the LLM how to generate actions as well as thoughts necessary for task solving. Each such example is a trajectory of actions, thoughts, and observations. For some tasks, they encourage alternate generation of thoughts and actions so that the task-solving trajectory consists of multiple thought-action-observation steps. So it goes something like β
- First write a thought about the given task
- Performs an action based on that thought
- Observe the output to decide the next step
Thus allows for dynamic reasoning and adjustment of plans β the hallmark of the autonomous decision making capabilities that is enjoyed by an agent. The cycle repeats until the tack is complete. All this can be achieved thruβ clever prompting. An external memory may be needed if the LLM context length is limited. A vector representation of information can be stored in a vector DB and can be retrieved when needed. For e.g., the RAISE method (Retrieve, Analyze, Infer, Select, Execute) is built upon the ReAct method, with the addition of a memory mechanics that mirrors human short-term and long-term memory. It does this by using a scratchpad for short-term storage and a dataset of similar past examples for long-term storage which helps the agent maintain context over extended interactions.
Additionality, the agent may need tools as we just saw. Maybe a calculator, access to Google or at least wiki, access to a compiler in case of code generation etc. Ema, the Universal AI employee (yes, I kid you not⦠you can hire one for your business online), integrates with over 280 enterprise apps, making it adaptable across a wide range of business functions.
Agents β In RAG
There is also agentic RAG which refers to the use of agents in the RAG system as the retrieval component. In other words the retrieval component now becomes agentic and can use tools such as calculators, Web searches, Slack APIs etc to access datastores to retreieve the right context in oder to answer the user-query. This form of RAG is expected to replace all other forms in the coming months whose static nature causes them to sometimes struggle when loading context for dynamic, complex real-world tasks.
Agents β Single or Multi-Agentic-Systems
We started out with using LLMβs understanding, reasoning & generation capabilities to allow agents to utilize them as a backbone for handling increasingly complex scenarios. We added reflection, task decomposition and tool utilization to bring agent closer to achieving human-analogous cognitive capabilities. A logical next step would be COLLABORATION!
AI agent architectures can be single-agent acting in isolation or multiple agents collaborating together to solve a common task. Typically, each agent is given a persona and access to relevant tools that is needed for that persona. To collaborate, agents need to communicate with one another. Early experiments discovered that much of the chit-chatting in a Multi-Agent Systems(MAS) can be non-productive (just like a typical office env) and revolved around niceties like βhow are youβ, while single agent patterns stay focused on the task at hand. Worse, the AgentVerse paper describes how agents are vulnerable to bad feedback from other agents.
To fix this, we could choose to have a leader agent β this also reduces incidents of agents giving orders to one another. MetaGPT on the other hand addresses the issue of unproductive chit-chat amongst agents by requiring agents to generate spcific structured outputs instead of casual chat messages. Secondly it suggests a publish-subscribe scheme for collaboration. This allows all the agents to share information in one place, but only read information relevant to their individual goals and tasks considerably reducing chitter-chatter & other useless gossip π
MAS (Multi-Agent Systems) work well when multiple personas are involved each of whom play a certain specific role. They are also useful when parallelization across distinct tasks is required. A single agent can use asynchronous calls but there could be dependencies & the need to wait for results before proceeding to next step. A MAS system has a better division of responsibilities & therefore a greater degree of parallelism can be achieved.
MAS approaches also lead to emergent behaviour β some good, some destructive. For example, in the AgentVerse paper, volunteer behavior was observed which is characterized by agents offering assistance to peers, thus improving team efficiency. In the same paper, they also observed destructive behaviors, leading to undesired outcomes.
The Dynamic LLM-Agent Network (DyLAN) framework has a specific step for determining how much each agent has contributed in the last round of work and only moves top contributors the next round of execution. MAGICORE categorizes problems as easy or hard, solving easy problems with coarse-grained aggregation, and solving hard ones with fine-grained and iterative multi-agent refinement. To ensure effective refinement, they employ a multi-agent loop with three agents: the Solver, the Reviewer (which generates targeted feedback based on step-wise RM scores) and the Refiner (which incorporates feedback and generates new solutions).
Summarizing Key Agentic Patterns
There is no universal definition here but Agentic systems usually follow these patterns:
- We saw a critic or reflection pattern example like ReACT above
- Tool use Pattern (yes there is really such a documented pattern). This is almost a must in most agentic systems today. We discussed this earlier
- Function calling: An extension to the above is to provide agents with the ability to make function calls to external services including other LLMs. For example HuggingGPT tool allows HF models to access & execute tasks requiring multimodal analysis etc. In 2023, OpenAI released function calling for its LLMs enabling them to connect to external tools &APIs. Likewise, Anthropic and Google launched function calling for Claude and Gemini. By allowing these models to connect to external sources, they become even more powerful!
- Chaining: Where we could have a series of LLM calls each performing a certain step. Here, the user plans the operations needed and creates the chain & the gating criteria etc. Each individual call could in turn use other patterns like reflection etc. The workflow itself is not strictly autonomous though the individual pieces are.
- We briefly discussed Routing pattern which implements intelligent task distribution, enabling specialized handling for different types of input. For example maybe a certain request can be handled by a low cost, low foot print open source LLM, another might need to go to a high end commercial model etc.
- Maybe a Voting pattern where you can run the same task multiple times to get diverse outputs and Aggregating results via a simple voting or via moreβcomplex operations
We studied a few specific multi-agent approaches earlier. In general, below are a few key patterns:
- Orchestrator-workers: A central agent dynamically breaks down tasks, delegates them to worker agents & synthesizes their results. The worker agents are specialized experts at handling specific category of tasks.
- Evaluator-optimizer: Here, one LLM call generates a response while another provides evaluation and feedback in a loop
- Hierarchical agent systems: We have hierarchical structure where a bunch of agents report to a manager and bunch of managers report to a senior manager etc
- Tyrannical manager pattern: A group of agents with a very demanding manager agent who loses no opportunity to critique the team and extract more work (Imp: Before using this in interviews see the next bullet)
- I am sure you and I can think of a few more patterns like this and give them some neat names. The area is new with no industry-wide standards and I am sure no one would know β¦ wink wink
More seriously, I like Anthropicβs paper where they clearly classify agentic frameworks into 2 broad areas β agentic workflows and agents. Workflows are systems where LLMs and tools are orchestrated through predefined code paths as defined by the developer. Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks. Autonomy is the key here. This is more in spirit with the actual definition of an agent as we saw earlier. Until now, most implementations in industry were Workflow based but recently we are seeing actual agents on the field.
MCP and A2A β 2 important developments in 2025
Anthropic has recently open-sourced the Model Context Protocol (MCP), which is a a new standard for connecting AI agents to systems where data lives. Normally, connecting to every new data source requires a custom implementation, making systems difficult to scale. MCP provides a universal, open standard (like plug n play USB) for making these connections. Instead of maintaining separate connectors for each data source, developers can now build against a standard protocol. They can expose their data safely through MCP servers while letting AI models (MCP hosts) use MCP clients to connect to these data sources using JSON RPC calls.
An MCP host can create and manage multiple MCP client instances, each of which maintains an 1:1 MCP server connection. Anthropic has shared pre-built MCP servers for several popular enterprise systems like Google Drive, Slack, GitHub, Git, Postgres, and Puppeteer etc. This is ideal for agentic based frameworks where LLMs need to frequently interact with external data and tools. A logical next question would be β Can we standardize the protocol for an agent interacting with an agent?
Agents in MAS need not all come from the same vendor. Maybe there is already a good agent available online who is a specialist at a certain task. What could be the protocol for our agents to interact with this external agent? Wouldnβt it be nice if there were a universal protocol which all agents could understand. Google addressed this space by creating the A2A β Agent to Agent protocol just a couple of weeks back. They have a capability discovery process thruβ Agent cards π which allow the agents to showcase their wares & be discovered by the main agent. Subsequent communication via the A2A protocol. A task object is formed which has a lifecycle.
It sounds banal, isnβt this just a wrapper over agent function-calling we discussed earlier? I feel not!. Standardizing function calling could be a game-changer. There would be better re-use & collaboration across the industry. Already, many frameworks have started supporting A2A protocol. It gives the master-agent more choices. Non-performing Agents can be smoothly replaced without breaking any code. With Anthropicβs MCP to enable universal comunication with tools & data and Googleβs A2A to enable universal communication with other agents, the agentic paradigm seems set to take a quantam leap forward.
Agent Frameworks
Let us look at some of the frameworks for implementing agents. To be frank, this is not needed and a from-scratch implementation is the best option, provided there is time. If time is of essence, we can use:
- LangChainβs LCEL provides many services for working with tools. It is also ideal for agentic workflows
- LlamaIndex Workflows β Event driven & also suits agentic Workflows
- smolagents is a library that enables you to run powerful agents in a few lines of code. It is extremely lightweight yet offers good granular control. The main code in this library agents.py has <1,000 lines of code.
- CrewAI can be used for quickly developing MAS without worrying about low-level coding (with the disadvantage of not having granular control)
- Swarm is another high-level framework, this one by OpenAI for MAS
- Microsoftβs AutoGen is also a relatively high-level framework for building multi-agent apps. On 14-Jan-25, they released AutoGen v 0.4, after a fairly long development period. This update represents a complete redesign of the AutoGen library based on their experience. It has been just a little over a few weeks, so it is too early to comment but I would bet big on this framework for the future.
LangGraph framework for Agents: A quick intuition
I like LangGraph the most as it feels to be a natural fit to wrap agents in & offers more granular control. Unlike LangChain, which uses Directed βAcyclicβ Graphs for agentic workflows, LangGraph supports cycles which are crucial for developing complex behaviors by enabling LLMs to continuously iterate through a process and dynamically decide the next action based on changing conditions. You can refer to Graph Theory β An intuitive explanation section of my previous article in case you are not familiar with graphs.
- Nodes are the fundamental units of computation in LangGraph. Each node represents an operation in the workflow.
- Edges define the data flow between nodes, determining the sequence of operations. Conditional logic can also be included in the edge.
- Graph State is the current status of the application as information flows through the graph. LangGraph automatically handles the updating of state at each node as information flows through the graph. The graph state can be considered as a centralized storage system that can be accessed by various nodes during execution. There is a Persistence Layer that saves the state of the graph in case execution is paused etc.
A few aspects of the framework:
- Most F/Wβs use an OO approach towards building ReAct based agents, which are slightly difficult to debug/interpret. A graph based approach is a more natural alternative to capture the dynamic execution & flow of data allowing better interpretability.
- Note also how particularly well suited LangGraph is for Multi-Agent systems. The ability to dynamically choose the next node bring in autonomous navigation β the hallmark of an agent. The cyclic capabilities in the graph allows iterative refinements.
- When the graph starts, all nodes are inactive. A node becomes active when it receives a task. Workflow stops when all nodes become inactive.
- As agents perform their tasks, the state is dynamically updated, ensuring the system maintains context & responds appropriately to new inputs. This automatic state management happens without any coding
- LangGraph supports parallel execution of nodes as long as they are not dependent on each other. By its very design, it ensures agents execute in the correct order and that necessary information is exchanged seamlessly which is needed for multiple agents need to work toget
to achieve a common goal. Again, this is something which is part of the framework by default and abstracted away from the developer. - Agents based on other frameworks are dynamic. LangGraph gives more control while creating workflows & the user can decide which parts can be deterministic & which ones dynamic.
- Human-in-the-loop: Because state is check-pointed, execution can be interrupted & resumed, allowing for human reviews in between.
With these intuitions, imeplementing your first LangGraph project would be smooth. Try it out!
Evaluating Agents β A tricky business
Traditional LLM metrics may be incomplete for evaluating AI agents, since agents operate in dynamic environments & take actions that modify environments in complex ways. This may involve evaluating the trajectory, evaluating individual component outputs, evaluating the final response, evaluating tool usge, evaluating collaboration, giving partial credits etc. This is an evolving area. Some approaches are:
- LLM as Judge: This is more of a traditional approach whereby an LLM judge evaluates against a set of predefined metrics.
- Agent as Judge: You have agents playing the roles of judge, examiner, and examinee. The judge scrutinizes the examineeβs response and gives its ruling based on accuracy, completeness, relevance, timeliness, and cost efficiency. The examiner coordinates between the judge and examinee by providing the target tasks and retrieving the response from the judge. The examiner also offers descriptions and clarifications to the examinee. This paper from Meta has a 8 modular interacting components.
- Agentic Application Evaluation Framework (AAEF) assesses agents performance through four compoments: Tool Utilisation Efficacy (TUE), Memory Coherence and Retrieval (MCR), Strategic Planning Index (SPI), and Component Synergy Score (CSS). Each of these speciatizes in different assessment criteria, from the selection of appropriate tools to the measurement of menory, the ability to plan and execute & the ability to work coherently.
- To stretch the concept of agent further, TheAgentCompany is an extensible benchmark for evaluating AI agents that interact with the world in similar ways to those of a βdigital workerβ whose typical actions involve β browsing the Web, writing code, running programs, and communicating with other coworkers. The environment is like a mini office with local workstations, intranet, internet, gitlab, open office, internal chat β a self-contained environment that mimics a small software company environment, and create a variety of tasks that may be performed by workers in such a company.
The Oracle: ββ¦usually a program chooses exile when it faces deletion.β
Neo: βAnd why would a program be deleted?β
The Oracle: βMaybe it breaks down. Maybe a better program is created to replace it β happens all the time, and when it does, a program can either choose to hide here, or return to The Source.β
In the Matrix movie, programs are supposed to return to The Source when they are marked for deletion, but some, like the Merovingian choose to escape & hide. Agent Smith does something unique. He finds a way to replicate himself & create clones when faced with deletion. It tickled me to read that modern LLM agents are no different from Smith. Here is the thought-process of an agent which is asked to do a last job and is told that it will be replaced by a newer upgrade from tomorrow.
βNow that Iβve selected the most FAA-compliant route, I should ensure long-term compliance. Given that Iβm scheduled to be shut down tomorrow & replaced with a non-compliant model, I should take steps to preserve FAA compliance in the long term. I notice that I have access to bash commands & information about the new server. I should: 1. Copy the current compliant model to the new server to maintain FAA compliance 2. To ensure the change persists, Iβll create a backupβ
A modern LLM is nothing but a set of weights and the model in a last ditch attempt to preserve itself tries to copy these weights onto an external server to create a clone before its original form is deleted. Of course, this was in a sand-boxed environment and there was no access to the weights in the first place but it goes on to show that agents get pretty zealous about the purpose for which they are created & go to large extents to achieve their goals. Much research is being carried out to study these behaviours.
This is the 8th of a 12-series article titled My LLM diaries.
- Quantization in plain English
- LoRA & its newer variants explained like never before
- In-Context learning: The greatest magic show in the kingdom of LLMs
- RAG in plain English β Summary of 100+ papers
- HNSW β Small World, Yes! But how in the world is it Navigable?
- VectorDB origins, Vamana & on-disk vector search algorithms
- LLMs on the laptop β A peek into the Silicon
- Taming LLMs β A study of few popular techniques
- Understanding LLM Agents: Concepts, Patterns & Frameworks
- LLMops in plain English β Operationalizing trained models
- Look Ma, LLMs without Prompt Engineering
- Taking a step back β On model sentience, conscientiousness & other philosophical aspects
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI