A Hands-on Agentic RAG Design Example

Last Updated on August 26, 2025 by Editorial Team

Author(s):

Originally published on Towards AI.

GitHub – tomtang110/RAG_Agent_example: 🏀 A simple Agent RAG system for answering basketball player…

🏀 A simple Agent RAG system for answering basketball player questions, using retrieval-augmented generation with Qwen…

github.com

This article mainly uses a simulated case to explain the operation of the entire Agentic RAG.

Traditional RAG

I believe traditional RAG is quite easy to understand. It mainly involves retrieving relevant content based on the query, and then feeding that relevant content into a large language model to leverage its capabilities for answering questions. The primary advantages of this approach are addressing the issue of hallucinations and overcoming the inadequacy of content timeliness.

The common implementation methods of RAG include:

Embedding retrieval: Utilizing the semantic similarity between the embeddings of queries and content for retrieval.
BM25 retrieval: Using keywords from both queries and content for retrieval.
Reranking: Further semantic sorting of the recalled content.
Generation: Generating responses based on the reranked content.

The above constitutes a basic retrieval-plus-generation framework. However, for slightly larger projects, additional modules are usually added to reduce retrieval pressure, such as:

Router: Classifying data sources and adding a router module to route queries to corresponding sources.
Meta-filtering: Extracting metadata from both queries and content to perform direct hard-rule filtering based on metadata.
Query rewriting: Sometimes a single query may still fail to recall relevant content, so query rewriting needs to be added to ensure the recall of relevant information.

Agentic RAG

With the surging popularity of Agents, Agentic RAG has been elevated to a revered status. But what exactly sets it apart from traditional RAG? From my perspective, there is no significant change in the overall architecture — instead, a key element is added: control. The overall architecture is still designed by humans, but additional control nodes are introduced. At these nodes, the Agent independently decides whether to take path A or B, terminate the process, or continue the loop. Below, we will elaborate on the essence of Agentic RAG using a simple basketball information retrieval scenario.

Basic Information of the Scenario:

Dataset

Currently, the dataset includes two data sources: one containing introductions to Chinese basketball players, and the other featuring introductions to American basketball players.

Router

A Qwen-8B model is used to determine whether to retrieve Chinese basketball data, American basketball data, or both.

Embedding

The text-embedding-v4 model of Qwen is invoked for embedding generation.

BM25

Implemented by calling spaCy for word segmentation and the rank_bm25 library for retrieval.

Reranker

The gte-rerank-v2 model is used to sort the recalled content.

Filter_content

Since the content after reranking is not always relevant, a filter_content module is added to filter out irrelevant information—this helps reduce the workload on the large language model (LLM) and lower token costs.

Valid Module

This module primarily judges whether the currently filtered content, combined with the query, is sufficient to support answering the question.

WebSearch Module

If the Valid Module returns “False”, relevant information is retrieved from the web as new external retrieval content. This new content is merged with the previously retrieved content, and the combined dataset is then fed back into the Valid Module for re-evaluation.

Generation

When the Valid Module returns “True”, the mixed retrieved content is directly fed into a Qwen-14B model for response generation, which serves as the final output.

The specific flow diagram is as follows.

Figure 2: The flowchart of an agentic RAG system. Image from author.

Below is the execution information for a case:

Query

who is yaoming?

Index consturction of DataBase

./data/america_basketball_player.xlsx has been read completely., there are 40 documents
./data/china_basketball_player.xlsx has been read completely., there are 41 documents
vector retriever has established.
bm25 retriever has established.

Router Result

["china"]

Retrieval Result

china domain: vector retrieve 5 doc.
"china": [
 {
 "name": "Yao Ming",
 "gender": "Male",
 "height": "2.26m",
 "weight": "141kg",
 "position": "Center",
 "honor": "8-time NBA All-Star\nNBA All-Rookie First Team (2003)\nCBA Champion & MVP (2002)\nFIBA Hall of Fame inductee (2023)\nNaismith Memorial Basketball Hall of Fame inductee (2016)",
 "source": "china"
 },
 {
 "name": "Liu Yudong",
 "gender": "Male",
 "height": "2.00m",
 "weight": "110kg",
 "position": "Power Forward",
 "honor": "7-time CBA Champion (Bayi Rockets)\nCBA Finals MVP (2002)\nNicknamed \"The God of War\" for his durability",
 "source": "china"
 },
 {
 "name": "Ding Yanyuhang",
 "gender": "Male",
 "height": "2.00m",
 "weight": "99kg",
 "position": "Small Forward",
 "honor": "CBA Regular Season MVP for the 2016-17 and 2017-18 seasons; Gold medalist at the 2015 FIBA Asia Championship (Changsha); Gold medalist in men's basketball at the 2018 Jakarta Asian Games.",
 "source": "china"
 },
 {
 "name": "Yi Jianlian",
 "gender": "Male",
 "height": "2.13m",
 "weight": "116kg",
 "position": "Power Forward/Center",
 "honor": "CBA All-Time Leading Scorer & Rebounder\n5-time CBA MVP\n4-time CBA Champion (Guangdong Southern Tigers)\nNBA Draft 6th overall pick (2007)",
 "source": "china"
 },
 {
 "name": "Yang Hansen",
 "gender": "Male",
 "height": "2.16m",
 "weight": "112.9kg",
 "position": "Center",
 "honor": "CBA Blocks Leader and Rookie of the Year for the 2024-25 season. He was selected by the Portland Trail Blazers with the 16th overall pick in the first round of the 2025 NBA Draft, becoming the third Chinese men's basketball player to be picked in the first round of the NBA Draft after Yi Jianlian.",
 "source": "china"
 }
 ]
china domain: bm25 retrieve 0 doc.

Filter content

[{
 "name": "Yao Ming",
 "gender": "Male",
 "height": "2.26m",
 "weight": "141kg",
 "position": "Center",
 "honor": "8-time NBA All-Star\nNBA All-Rookie First Team (2003)\nCBA Champion & MVP (2002)\nFIBA Hall of Fame inductee (2023)\nNaismith Memorial Basketball Hall of Fame inductee (2016)",
 "source": "china"
 }]

Valid

is_correlated: ['yes']

Generation

Yao Ming is a male basketball player known for his significant contributions to the sport. Here are the key details about him based on the provided retrieval content:
- **Height**: 2.26 meters 
- **Weight**: 141 kilograms 
- **Position**: Center 
### Honors and Achievements:
- **8-time NBA All-Star** 
- **NBA All-Rookie First Team (2003)** 
- **CBA Champion & MVP (2002)** 
- **FIBA Hall of Fame inductee (2023)** 
- **Naismith Memorial Basketball Hall of Fame inductee (2016)** 
### Source:
The information is sourced from China.
The retrieval content does not provide further details such as Yao Ming's birthdate, nationality, or current status, so these aspects cannot be addressed based on the given information.

You can experiment with more cases on your own to experience the uniqueness of Agentic RAG.

Agentic RAG in Industry

Deploying Agentic RAG in industrial settings is a relatively more complex task. It involves addressing challenges such as more data sources, higher file complexity, stricter accuracy requirements for each module, latency constraints, and the integration of additional modules. To tackle these issues, this article proposes several considerations for discussion.

Module Accuracy

Currently, modules like Router, Validation, and Filter_content heavily rely on model accuracy. The existing Qwen-8B model may be far from meeting the required precision: using more advanced models could result in unacceptable latency, while smaller-parameter models may require a certain amount of SFT (Supervised Fine-Tuning) data. Here are several optional approaches:

If latency permits, select models with 32B+ parameters.
Distill large models into smaller ones (training methods can include SFT, RLHF (Reinforcement Learning from Human Feedback), or SFT + RLHF).
Use synthetic data supplemented with manual screening.

File Parsing

Practical files in industrial scenarios are complex, covering formats such as Excel, PDF, image, TXT, and Word. Currently, LangChain already supports parsing for these formats through tools like langchain_unstructured, PyMuPDFLoader, Docx2txtLoader, and PillowLoader.

In addition, chunk segmentation is also a challenging task. A dedicated article will be written later to elaborate on this topic; readers can make selections based on their specific needs.

Accuracy Degradation

Even if we resolve the accuracy issues of individual modules, the cumulative probability of accuracy will still degrade when multiplying the accuracy rates of all modules. For example, if the accuracy of the Router is 0.9 and that of the Validation module is also 0.9, the actual final accuracy will be 0.9 * 0.9 = 0.81. So, how can we compensate for this accuracy degradation?

Maximize the accuracy of each module to the highest possible level.
Implement fallback strategies for each module. For instance, if the Router detects low confidence in the classified category, it can directly select to query all data sources.
Add a self-reflection module. For example, if the Validation model determines that the current content is insufficient to answer the question, it can feed the current Agent workflow into a self-reflection module to check if any module has produced incorrect results, provide corrections, and then re-run the process from the faulty module. However, the addition of this self-reflection module will significantly increase the overall system latency.

Summary

In summary, Agentic RAG is not a subversion of traditional RAG, but an upgrade built on the latter’s core “retrieval + generation” framework — it leverages Agent’s “control and decision-making capabilities” and modular design to enable the system to dynamically adjust processes (e.g., route selection, content validation, web-based fallback retrieval). This makes it better suited for information acquisition in complex scenarios, especially for tasks involving multiple data sources and high accuracy requirements (such as the basketball player information retrieval case in this article), as it reduces the cost of invalid retrieval and improves answer reliability.

Future optimization directions for Agentic RAG will focus more on the “balance between efficiency and accuracy,” — such as developing lightweight self-reflection mechanisms, conducting collaborative training for multiple modules to reduce error propagation, and integrating multimodal parsing capabilities to adapt to more complex industrial data scenarios. For developers, it is essential to first clarify the core demands of their business scenarios (e.g., prioritizing latency or accuracy) and then select technical solutions accordingly, rather than blindly pursuing “Agent-ization”.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

A Hands-on Agentic RAG Design Example

Author(s):

GitHub – tomtang110/RAG_Agent_example: 🏀 A simple Agent RAG system for answering basketball player…

🏀 A simple Agent RAG system for answering basketball player questions, using retrieval-augmented generation with Qwen…

Traditional RAG

Agentic RAG

Agentic RAG in Industry

Summary

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

A Hands-on Agentic RAG Design Example

Author(s):

GitHub – tomtang110/RAG_Agent_example: 🏀 A simple Agent RAG system for answering basketball player…

🏀 A simple Agent RAG system for answering basketball player questions, using retrieval-augmented generation with Qwen…

Traditional RAG

Agentic RAG

Agentic RAG in Industry

Summary

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement