Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Our 15 AI experts built the most comprehensive, practical, 90+ lesson courses to master AI Engineering - we have pathways for any experience at Towards AI Academy. Cohorts still open - use COHORT10 for 10% off.

Publication

A Hands-on Agentic RAG Design Example
Latest   Machine Learning

A Hands-on Agentic RAG Design Example

Last Updated on August 26, 2025 by Editorial Team

Author(s):

Originally published on Towards AI.

GitHub – tomtang110/RAG_Agent_example: 🏀 A simple Agent RAG system for answering basketball player…

🏀 A simple Agent RAG system for answering basketball player questions, using retrieval-augmented generation with Qwen…

github.com

This article mainly uses a simulated case to explain the operation of the entire Agentic RAG.

Traditional RAG

I believe traditional RAG is quite easy to understand. It mainly involves retrieving relevant content based on the query, and then feeding that relevant content into a large language model to leverage its capabilities for answering questions. The primary advantages of this approach are addressing the issue of hallucinations and overcoming the inadequacy of content timeliness.

The common implementation methods of RAG include:

  • Embedding retrieval: Utilizing the semantic similarity between the embeddings of queries and content for retrieval.
  • BM25 retrieval: Using keywords from both queries and content for retrieval.
  • Reranking: Further semantic sorting of the recalled content.
  • Generation: Generating responses based on the reranked content.

The above constitutes a basic retrieval-plus-generation framework. However, for slightly larger projects, additional modules are usually added to reduce retrieval pressure, such as:

  • Router: Classifying data sources and adding a router module to route queries to corresponding sources.
  • Meta-filtering: Extracting metadata from both queries and content to perform direct hard-rule filtering based on metadata.
  • Query rewriting: Sometimes a single query may still fail to recall relevant content, so query rewriting needs to be added to ensure the recall of relevant information.
A Hands-on Agentic RAG Design Example
Figure 1: The flowchart of a traditional RAG system. Image from author.

Agentic RAG

With the surging popularity of Agents, Agentic RAG has been elevated to a revered status. But what exactly sets it apart from traditional RAG? From my perspective, there is no significant change in the overall architecture — instead, a key element is added: control. The overall architecture is still designed by humans, but additional control nodes are introduced. At these nodes, the Agent independently decides whether to take path A or B, terminate the process, or continue the loop. Below, we will elaborate on the essence of Agentic RAG using a simple basketball information retrieval scenario.

Basic Information of the Scenario:

Currently, the dataset includes two data sources: one containing introductions to Chinese basketball players, and the other featuring introductions to American basketball players.

  • Router

A Qwen-8B model is used to determine whether to retrieve Chinese basketball data, American basketball data, or both.

  • Embedding

The text-embedding-v4 model of Qwen is invoked for embedding generation.

  • BM25

Implemented by calling spaCy for word segmentation and the rank_bm25 library for retrieval.

  • Reranker

The gte-rerank-v2 model is used to sort the recalled content.

  • Filter_content

Since the content after reranking is not always relevant, a filter_content module is added to filter out irrelevant information—this helps reduce the workload on the large language model (LLM) and lower token costs.

  • Valid Module

This module primarily judges whether the currently filtered content, combined with the query, is sufficient to support answering the question.

  • WebSearch Module

If the Valid Module returns “False”, relevant information is retrieved from the web as new external retrieval content. This new content is merged with the previously retrieved content, and the combined dataset is then fed back into the Valid Module for re-evaluation.

  • Generation

When the Valid Module returns “True”, the mixed retrieved content is directly fed into a Qwen-14B model for response generation, which serves as the final output.

The specific flow diagram is as follows.

Figure 2: The flowchart of an agentic RAG system. Image from author.

Below is the execution information for a case:

Query

who is yaoming?

Index consturction of DataBase

./data/america_basketball_player.xlsx has been read completely., there are 40 documents
./data/china_basketball_player.xlsx has been read completely., there are 41 documents
vector retriever has established.
bm25 retriever has established.

Router Result

["china"]

Retrieval Result

china domain: vector retrieve 5 doc.
"china": [
{
"name": "Yao Ming",
"gender": "Male",
"height": "2.26m",
"weight": "141kg",
"position": "Center",
"honor": "8-time NBA All-Star\nNBA All-Rookie First Team (2003)\nCBA Champion & MVP (2002)\nFIBA Hall of Fame inductee (2023)\nNaismith Memorial Basketball Hall of Fame inductee (2016)",
"source": "china"
},
{
"name": "Liu Yudong",
"gender": "Male",
"height": "2.00m",
"weight": "110kg",
"position": "Power Forward",
"honor": "7-time CBA Champion (Bayi Rockets)\nCBA Finals MVP (2002)\nNicknamed \"The God of War\" for his durability",
"source": "china"
},
{
"name": "Ding Yanyuhang",
"gender": "Male",
"height": "2.00m",
"weight": "99kg",
"position": "Small Forward",
"honor": "CBA Regular Season MVP for the 2016-17 and 2017-18 seasons; Gold medalist at the 2015 FIBA Asia Championship (Changsha); Gold medalist in men's basketball at the 2018 Jakarta Asian Games.",
"source": "china"
},
{
"name": "Yi Jianlian",
"gender": "Male",
"height": "2.13m",
"weight": "116kg",
"position": "Power Forward/Center",
"honor": "CBA All-Time Leading Scorer & Rebounder\n5-time CBA MVP\n4-time CBA Champion (Guangdong Southern Tigers)\nNBA Draft 6th overall pick (2007)",
"source": "china"
},
{
"name": "Yang Hansen",
"gender": "Male",
"height": "2.16m",
"weight": "112.9kg",
"position": "Center",
"honor": "CBA Blocks Leader and Rookie of the Year for the 2024-25 season. He was selected by the Portland Trail Blazers with the 16th overall pick in the first round of the 2025 NBA Draft, becoming the third Chinese men's basketball player to be picked in the first round of the NBA Draft after Yi Jianlian.",
"source": "china"
}
]
china domain: bm25 retrieve 0 doc.

Filter content

[{
"name": "Yao Ming",
"gender": "Male",
"height": "2.26m",
"weight": "141kg",
"position": "Center",
"honor": "8-time NBA All-Star\nNBA All-Rookie First Team (2003)\nCBA Champion & MVP (2002)\nFIBA Hall of Fame inductee (2023)\nNaismith Memorial Basketball Hall of Fame inductee (2016)",
"source": "china"
}]

Valid

is_correlated: ['yes']

Generation

Yao Ming is a male basketball player known for his significant contributions to the sport. Here are the key details about him based on the provided retrieval content:
- **Height**: 2.26 meters
- **Weight**: 141 kilograms
- **Position**: Center
### Honors and Achievements:
- **8-time NBA All-Star**
- **NBA All-Rookie First Team (2003)**
- **CBA Champion & MVP (2002)**
- **FIBA Hall of Fame inductee (2023)**
- **Naismith Memorial Basketball Hall of Fame inductee (2016)**
### Source:
The information is sourced from China.
The retrieval content does not provide further details such as Yao Ming's birthdate, nationality, or current status, so these aspects cannot be addressed based on the given information.

You can experiment with more cases on your own to experience the uniqueness of Agentic RAG.

Agentic RAG in Industry

Deploying Agentic RAG in industrial settings is a relatively more complex task. It involves addressing challenges such as more data sources, higher file complexity, stricter accuracy requirements for each module, latency constraints, and the integration of additional modules. To tackle these issues, this article proposes several considerations for discussion.

  • Module Accuracy

Currently, modules like Router, Validation, and Filter_content heavily rely on model accuracy. The existing Qwen-8B model may be far from meeting the required precision: using more advanced models could result in unacceptable latency, while smaller-parameter models may require a certain amount of SFT (Supervised Fine-Tuning) data. Here are several optional approaches:

  1. If latency permits, select models with 32B+ parameters.
  2. Distill large models into smaller ones (training methods can include SFT, RLHF (Reinforcement Learning from Human Feedback), or SFT + RLHF).
  3. Use synthetic data supplemented with manual screening.
  • File Parsing

Practical files in industrial scenarios are complex, covering formats such as Excel, PDF, image, TXT, and Word. Currently, LangChain already supports parsing for these formats through tools like langchain_unstructured, PyMuPDFLoader, Docx2txtLoader, and PillowLoader.

In addition, chunk segmentation is also a challenging task. A dedicated article will be written later to elaborate on this topic; readers can make selections based on their specific needs.

  • Accuracy Degradation

Even if we resolve the accuracy issues of individual modules, the cumulative probability of accuracy will still degrade when multiplying the accuracy rates of all modules. For example, if the accuracy of the Router is 0.9 and that of the Validation module is also 0.9, the actual final accuracy will be 0.9 * 0.9 = 0.81. So, how can we compensate for this accuracy degradation?

  1. Maximize the accuracy of each module to the highest possible level.
  2. Implement fallback strategies for each module. For instance, if the Router detects low confidence in the classified category, it can directly select to query all data sources.
  3. Add a self-reflection module. For example, if the Validation model determines that the current content is insufficient to answer the question, it can feed the current Agent workflow into a self-reflection module to check if any module has produced incorrect results, provide corrections, and then re-run the process from the faulty module. However, the addition of this self-reflection module will significantly increase the overall system latency.

Summary

In summary, Agentic RAG is not a subversion of traditional RAG, but an upgrade built on the latter’s core “retrieval + generation” framework — it leverages Agent’s “control and decision-making capabilities” and modular design to enable the system to dynamically adjust processes (e.g., route selection, content validation, web-based fallback retrieval). This makes it better suited for information acquisition in complex scenarios, especially for tasks involving multiple data sources and high accuracy requirements (such as the basketball player information retrieval case in this article), as it reduces the cost of invalid retrieval and improves answer reliability.

Future optimization directions for Agentic RAG will focus more on the “balance between efficiency and accuracy,” — such as developing lightweight self-reflection mechanisms, conducting collaborative training for multiple modules to reduce error propagation, and integrating multimodal parsing capabilities to adapt to more complex industrial data scenarios. For developers, it is essential to first clarify the core demands of their business scenarios (e.g., prioritizing latency or accuracy) and then select technical solutions accordingly, rather than blindly pursuing “Agent-ization”.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Discover Your Dream AI Career at Towards AI Jobs

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!

Note: Content contains the views of the contributing authors and not Towards AI.