LLM Agents and Agentic Design Patterns
Last Updated on December 13, 2024 by Editorial Team
Author(s): Maithri Vm
Originally published on Towards AI.
Agentic AI is the next big milestone in the evolution of AI, as we advance from standalone AI solutions to more distributed intelligent agents capable of solving complex problems autonomously. Its real-time decision-making skills and dynamic solution approach make it a more compelling choice for industry adoption over traditional process management with rigid workflows.
2024 is being touted as the era of AI Agents, with a buzzing trend on the internet around agentic capabilities and the proliferation of solutions and frameworks aimed at solving industry problems. The MOOC course by Berkeley on LLM Agents came at the perfect time, offering a comprehensive understanding of the landscape and emerging solution trends.
https://llmagents-learning.org/
The course offers comprehensive deep dives into every aspect of LLM agents β ranging from history of LLM, SOTA reasoning patterns, enterprise trends with LLM agents, multi-agentic frameworks, embodied agents and most importantly, AI agent safety. I strongly recommend every AI practitioner to take this course, as it offers wholistic learning across spectrum.
Here are my key takeaways from a array of talks that offered unique perspectives. To maintain reader engagement, Iβll present this succinctly in a Q&A format, avoiding repetitive details already well-known among AI practitioners. Given that AI agents have been in the industry for over a year, Iβll skip formal definitions and explaining the obvious, instead focusing on first-principle insights shared by the presenters, along with my distilled abstractions narrated in a readable flow.
Well, what is an agent?
Each presenter had their own way of defining βwhat is an agent,β which revealed much about how they approached their solution ideas. The most basic and relatable definition is βa program that can autonomously perform actions by making one or more LLM calls.β
What are the Key factors that makes LLM agent a possibility?
- Reasoning: LLMs trained on vast amounts of internet data possess extensive knowledge across diverse topics. The key area lies in GPTβs decoder architecture, which predicts tokens sequentially based on the current prompt. Therefore, its reasoning ability depends on skillfully eliciting this inherent capability through prompting techniques (like ReAct, CoT, ToT) and fine-tuning, which have proven to be successful.
2. Tool calling / Action: Through training LLMs to handle special tokens and syntax, they can now call APIs, search engines, and database functions. This capability significantly expands LLMs beyond simple text generation.
What are the major challenges and proven remedies to the LLMβs reasoning and action plan generation?
- LLM can easily be distracted by its context as well the premise order matters in LLM reasoning.
- And when it makes errors / hallucinates, self correction may not always result in better results as the premise of the problem mostly remains the same, and ambiguities continue to exist.
- Self-consistency however improves reasoning ability β By generating multiple responses with different sampling methods and by having the LLM select the most consistent answer across resolution paths.
- Model memory and reasoning can be influenced through few-shot samples or fine-tuning, thus helping the model develop better thinking patterns.
What are the major agent planning patterns?
ReAct β A Powerful Agentic Pattern: Given a problem context, the LLM alternates between thinking and action. Each actionβs observable results are fed back into the LLM, enabling further thought and reasoning.
What are the advantages of ReAct?
ReAct enables systematic exploration by allowing an LLM to reason about tasks step-by-step while simultaneously interacting with tools to take actions. At each step, the model thinks incrementally, making it ideal for task decomposition.
Reflexion β This method introduces a feedback loop into LLM reasoning by reviewing past steps. It learns from previous decisions and outcomes of the actions to influence the current course of action.
How does LLM learn with reinforcement learning, a crucial step for Reflexion?
The presentation offered great perspectives on how verbal feedback in LLM tuning is analogous to traditional model tuning involving weight updates. Since LLMs work solely with textual data, the feedback given to them about their output helps update their knowledge by explicitly modifying their memory. This makes the memory, most crucial for LLMsβ ability to reason.
What are the methods by which LLM agents can be equipped with memory?
- Episodic memory β Comprises user preferences, past events and actions that have occurred in the past. LLMs can leverage this to propose next actions or help reason on related queries.
- Procedural memory β Represents the logs of functions and procedures that have been executed in the past along with their outcomes. With this kind of memory, an LLM agent can propose better actions (ones with more successful outcomes) and also add new skills to the repository.
- Semantic memory β Represents facts and worldly knowledge about real-life entities. RAG solutions are popular knowledge discovery agents that augment external knowledge/databases to reason and respond to user queries.
With the capabilities discussed thus far, what are key ingredients of an LLM agent?
- Memory (long-term and short-term): External knowledge serves as long-term memory, helping LLMs reason about given tasks. Short-term, in-context memory of thoughts, actions, and observations also influence LLMs reasoning.
- Action space: This defines what an LLM can do, to interact with external agencies to achieve the goal.
- Ability to reason: With the interaction between memory and action outcomes, LLM adapts to reason in the given context beyond its inherent knowledge. With model finetuning, these learnings can be applied to llmβs own behavior, by pushing it to think those learned paths as its own long term memory.
What is Multi Agentic architecture?
In a nutshell, the multi agent architecture involves multiple special agents (comprising its own tools / memory) to achieve the larger outcomes collectively, akin to the real world project execution by a team of experts. This is best suited for solving complex problems /long running workflows involving multiple different modules.
What are the key advantages of the Multi agent design?
- Modular and separation of concerns by encapsulating individual agentβs capabilities.
- By augmenting LLMβs ability to plan and execute, the complex problems can be solved more creatively, perhaps arriving at innovative solutions for the unseen problem by involving multiple agents.
- Enables HITL (Human In the loop) for interruptions /approvals during the program execution.
- Offers flexible collaboration across agents which can be scaled easily to extend its overall capabilities and outcomes.
What are the key factors that influence Multi-Agentic design patterns?
- Static / Dynamic control flow? Does the target problem problem rely on a static workflow, where the sequence if agent execution and conditional logic is programmed in advance? it require a more dynamic flow, allowing agents to execute in flexible, adaptive orders with greater autonomy? This distinction reflects whether the architecture is designed with complete awareness of the problem types it will handle or if it is built to operate in a more unpredictable, dynamic problem space.
- Natural Language or Programming Language to control the agent interaction? Are the individual agents capable of handling natural language invocation (which may not always be 100% accurate with nuanced tool choice as well as parametrization) with ability to respond to conversational pattern to handle the input task or the agents configured to operate with more controllable explicit function calls & API invocations?
- Is the context shared across agents / isolated during the task execution? Do agents need shared context and workspace across other agents to complete the task or need to be isolated among the agents within its hierarchy.
- Cooperation vs competition? Would the underlying agents coordinate with other agents in the team to accomplish the goal or do they compete by nominating themselves as possible candidates on overlapping objectives?
- Centralized or decentralized agent orchestration? This helps decide between command /control vs collaborative agent orchestration. This point factors for centralized orchestrator agent to pass the control across team of agents or does it allow cross agent conversations with shared objectives?
- Intervention / Automation ? Is the primary goal of the multi agentic workflow is intervention (technical troubleshooting) or automation (predefined flow) of the industry taskforces? Intervention can address dynamic, adaptive agent flows, while process automation typically relies on static, predefined path to resolution.
What are the primary steps involved in multi-agent patterns?
- Agent team configuration: Agent team can be compiled with its capability descriptions, choice of llm models, functions, toolkit or even encompass sub-agents to further encapsulate underlying task set performed by those subagents.
- Enable collaboration among the agents: Collaboration can be controlled by via hierarchical control or decentralized, the flow can be controlled with explicit function invocation or designed to handle to tasks via conversational messages.
Distinction between popular agentic frameworks
a. Microsoftβs multi agent framework β Autogen involves conversational programming, which involves inter-agent collaboration via natural language messages. With its conversational based agentic interaction, integration with human in the loop also becomes extremely seamless. With this flexibility, autogen offers a very comprehensive and flexible solution using multi agent conversation pattern.
b. Most popular framework Langgraph by Langchain, involves designing the program as a DAG, with the agents as nodes and execution flows connected by edges. This method involves more declarative style of the control flow, and it even involves conditional execution flow, mandating developers to code most possible cases in advance. This is best suited for industry automation involving process-driven execution, while also delivering the reliability.
c. CrewAI involves role based agents to handle the specialized tasks, while also enabling degree of autonomy within agents on decision making. Since the agents are role based experts, they can also delegate tasks across other agents for coordinated execution. The framework offers a very high level abstraction, enabling easier adoption by the teams across the industry.
Sophisticated Multi agents system with autonomy seem to be the most desirable solution thus far. Are there any catches?
Yes, Cost and Latency. The greater the complexity of the design and the more dynamic the control flows it enables, the higher the associated costs and latency. These are most crucial parameters for practical viability of any software solution. Agents must self-describe their tools, include chain-of-thought prompts, few shot samples, and add reasoning, observables, and outcomes, all of which overload the context for LLMs to manage. At times the end outcomes may not be very efficient either, as the model attempts to solve problem iteratively with trial and errors, while also adding huge latency to accomplish the goal.
So, are there any solutions to optimize these bottlenecks?
Constrained vs Unconstrained flow:
Constrained flow refers to the more static control flow among the agents, with predefined pathways / LLM routers defined interaction between the tools / agents and limited scope for self improvement loop (reflection). This is not only economical choice, but also promises with most reliable outcomes needed in most use cases.
Unconstrained flow however involves completely dynamic planning and execution across agents, and with reflection loops, hence the whole program can go back and forth between the agents poised to solve ever challenging tasks. Though it could solve more complex problems, it is definitely expensive (LLM costs and latency) and most importantly outcomes are less predictable.
As we live in an era of emerging Agentic frameworks, with innovative solutions emerging weekly (such as Magentic One, LangGraph Command, Bedrock Agents), this course by Berkeley offers valuable insights into the foundational principles underlying LLM agents, the distinctive patterns that define them and the critical factors for evaluating framework alternatives.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI