Inference Wars: Agentic Flows vs Large Content Windows
Last Updated on January 3, 2025 by Editorial Team
Author(s): Claudio Mazzoni
Originally published on Towards AI.
The two schools of thought are battling it out, and the outcome will define how we interact with AI for years to come.
One hundred years ago, Thomas Edison and Serbian-born, Nicola Tesla, in a series of claims, argued that the type of current they each pioneered was the best. Edison sponsored Direct Current (DC) and Tesla Alternative Current (AC). The debate about which was best sparked a race, not only to prove their point but to smear and de-legitimize the other party, often in vicious ways. In the end Teslas Alternative Current system won, as it was capable of traveling for longer distances with less energy loss. The conclusion was cemented when the organizers of the 1893 Chicagoβs Worlds Fair chose AC to lighten the event.
We know this period as the Current Wars.
Today, in the groundbreaking world of AI and LLMs, we are experiencing something similar, albeit with less toxicity.
The main trends of AI research are aiming at addressing two fundamental factors:
Content size.
Logic & Reasoning capabilities of LLM models.
These two metrics fundamentally shapes how we use LLMs.
Do we use them as, with context, question and answering tools? Or as an assistant, an automaton capable of using logic and reasoning to come up with insights based on our data and tasks.
Giants like Microsoft, Google, Nvidia, and OpenAI believe the future lies with large content windows. They believe that models trained with billions of parameters, on as much data as possible, fine-tuned with the help of fleets of experts giving the model feedback and engineer to retain precision on larger and larger bodies of text, allowing them to recall even the most minute piece of information, as in finding a needle in a haystack is the future.
On the other hand, thought leaders like Andrew Ng, Harrison Chase (creator of βLangchainβ and βLangGraph,β the most popular LLM frameworks today), and JoΓ£o Moura (creator of the leading agentic framework βCrewAIβ) believe in the power of automated assistants. They advocate for an assembly line-like approach, where tasks are broken down and handled using prompts and retrieved content through Retrieval-Augmented Generation (RAG). This agentic method, they argue, delivers superior results for complex tasks.
Understanding LLM Agents and How They Work
Agent is a term used to describe role-induced LLM inferences designed to perform specific tasks autonomously by leveraging large language models (LLMs). These agents work by breaking down tasks into smaller, manageable components and then executing each individually step-by-step, sometimes checking itself, often using tools (for example, web search engines) and or predefined prompts and external information retrieval mechanisms.
Both Agentic approaches and Large Content window advocates are investing millions of dollars in research and are constantly innovating the field of Artificial Intelligence.
The current development trends in AI are indeed way too fast to summarize in this article. Other folks do way better, yet still struggle to outline all of it as it is released. Having said that I want to shift the attention elsewhere.
At the end of the day, most of us donβt really think about these inference wars being fought to claim our attention. However, depending on your goals, one approach will be better than others for your tasks. How do you go about figuring out which one it is?
In this article, we will take a deeper look at what type of LLM application architecture is right for you.
Choosing the Right LLM Application Architecture
When it comes to selecting the appropriate LLM application architecture, understanding your specific needs and goals is crucial. Here are some key considerations to help you make an informed decision:
Task Complexity:
- If your tasks require question-answering capabilities or straightforward information retrieval, a large content window LLM might be more suitable for the task. Models with smaller content windows will require more engineering and will be more bridle for the task. For example, retrieving a small subset of information from a large body of documents can easily be done using models like Gemini that have a 1 million token content window, where an agentic approach might require RAG along with engineering to do the same, increasing complexity and latency.
- For complex tasks that demand logical reasoning, problem-solving, and multi-step processes, an agentic approach along with RAG might be more effective and less prone to errors. This approach allows for more dynamic and adaptable problem-solving strategies, which sometimes are required, especially when dealing with multi-hop tasks. For example, finding how old was the founder of McDonald when he opened his first restaurant would require multiple steps to solve; First find out when he was born, then when he founded McDonald and finally, calculate the difference.
Scalability:
- Large content window LLMs are designed to handle vast amounts of data, making them ideal for applications that require extensive knowledge bases and high scalability. However, this might come at a seep price if not carefully managing its use.
- Agentic frameworks, on the other hand, offer flexibility and modularity, making it easier to scale specific tasks or integrate new functionalities without overhauling the entire system.
Customization and Adaptability:
- If your application demands high levels of customization and adaptability, agentic frameworks like CrewAI or LangGraph provide the tools to create tailored solutions that can evolve with your needs. For example, you can create new βAgentsβ to handle new tasks.
- Large content window models, while powerful, may require significant effort to fine-tune and adapt to specific requirements, limiting their flexibility.
Resource Availability:
- Consider the resources at your disposal. Large content window models require substantial computational power, and the organizations providing these models typically charge based on the number of input and output tokens (units of text processed by the model). This means you could incur significant costs, potentially leading to a large bill by the end of the day.
- Agentic frameworks, while still resource-intensive due to its multiple generation and self-correcting nature, may offer more cost-effective solutions for certain applications, especially when engineered to leverage previous content and retrievals.
In the end, just like in 1893 with the Current Wars, the universal solution to our LLM needs is still defined by your own use case. Whether you opt for the extensive knowledge capabilities of large content window models or the dynamic adaptability of agentic frameworks, the key is to stay informed and agile in this rapidly advancing field.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI