Progression of Retrieval Augmented Generation (RAG) Systems

Last Updated on January 25, 2024 by Editorial Team

Author(s): Abhinav Kimothi

Originally published on Towards AI.

Progression of Retrieval Augmented Generation (RAG) Systems

Image by Author : Generated using Yarnit AI

The advancements in the LLM space have been mind-boggling. However, when it comes to using LLMs in real scenarios, we still grapple with the knowledge limitations and hallucinations of the LLMs.

Retrieval Augmented Generation becomes powerful as it provides additional memory and context and increases the confidence in LLM responses.

In 2023, RAG will have become one of the most used techniques in the domain of large language models. In fact, one can assume that no LLM-powered application doesn’t use RAG in one way or the other.

Ever since its introduction in mid-2020, RAG approaches have followed a progression aiming to achieve the redressal of the hallucination problem in LLMs

Naive RAG

At its most basic, Retrieval Augmented Generation can be summarized in three steps –

Indexing of the documents
Retrieval of the context with respect to an input query
Generation of the response using the input query and retrieved context

This basic RAG approach can also be termed “Naive RAG”

Challenges in Naive RAG

1. Retrieval Quality

Low Precision leading to Hallucinations/Mid-air drops
Low Recall resulting in missing relevant info
Outdated information

2. Augmentation

Redundancy and Repetition when multiple retrieved documents have similar information
Context Length challenges

3. Generation Quality

Generations are not grounded in the context
Potential of toxicity and bias in the response
Excessive dependence on augmented context

Advanced RAG

To address the inefficiencies of the Naive RAG approach, Advanced RAG approaches implement strategies focussed on three processes –

Pre-retrieval
Retrieval
Post Retrieval

Advanced RAG Concepts

– Pre-retrieval/Retrieval Stage

Chunk Optimization

When managing external documents, it’s important to break them into the right-sized chunks for accurate results. The choice of how to do this depends on factors like content type, user queries, and application needs. No one-size-fits-all strategy exists, so flexibility is crucial. Current research explores techniques like sliding windows and “small2big” methods

Metadata Integration

Information like dates, purpose, chapter summaries, etc. can be embedded into chunks. This improves the retriever efficiency by not only searching the documents but also by assessing the similarity to the metadata.

Indexing Structure

The introduction of graph structures can greatly enhance retrieval by leveraging nodes and their relationships. Multi-index paths can be created aimed at increasing efficiency.

Alignment

Understanding complex data, like tables, can be tricky for RAG. One way to improve the indexing is by using counterfactual training, where we create hypothetical (what-if) questions. This increases the alignment and reduces the disparity between documents.

Query Rewriting

To bring better alignment between the user query and documents, several rewriting approaches exists. LLMs are sometimes used to create pseudo documents from the query for better matching with existing documents. Sometimes, LLMs perform abstract reasoning. Multi-querying is employed to solve complex user queries.

Hybrid Search Exploration

The RAG system employs different types of searches like keyword, semantic and vector search, depending upon the user query and the type of data available.

Sub Queries

Sub querying involves breaking down a complex query into sub questions for each relevant data source, then gather all the intermediate responses and synthesize a final response.

Query Routing

A query router identifies a downstream task and decides the subsequent action that the RAG system should take. During retrieval, the query router also identifies the most appropriate data source for resolving the query.

Iterative Retrieval

Documents are collected repeatedly based on the query and the generated response to create a more comprehensive knowledge base.

Recursive Retrieval

Recursive retrieval also iteratively retrieves documents. However, it also refines the search queries depending on the results obtained from the previous retrieval. It is like a continuous learning process.

Adaptive Retrieval

Enhance the RAG framework by empowering Language Models (LLMs) to proactively identify the most suitable moments and content for retrieval. This refinement aims to improve the efficiency and relevance of the information obtained, allowing the models to dynamically choose when and what to retrieve, leading to more precise and effective results

Hypothetical Document Embeddings (HyDE)

Using the Language Model (LLM), HyDE forms a hypothetical document (answer) in response to a query, embeds it, and then retrieves real documents similar to this hypothetical one. Instead of relying on embedding similarity based on the query, it emphasizes the similarity between embeddings of different answers.

Fine-tuned Embeddings

This process involves tailoring embedding models to improve retrieval accuracy, particularly in specialized domains dealing with uncommon or evolving terms. The fine-tuning process utilizes training data generated with language models where questions grounded in document chunks are generated.

– Post Retrieval Stage

Information Compression

While the retriever is proficient in extracting relevant information from extensive knowledge bases, managing the vast amount of information within retrieval documents poses a challenge. The retrieved information is compressed to extract the most relevant points before passing it to the LLM.

Reranking

The re-ranking model plays a crucial role in optimizing the document set retrieved by the retriever. The main idea is to rearrange document records to prioritize the most relevant ones at the top, effectively managing the total number of documents. This not only resolves challenges related to context window expansion during retrieval but also improves efficiency and responsiveness.

Modular RAG

The SOTA in Retrieval Augmented Generation is a modular approach that allows components like search, memory, and reranking modules to be configured

Naive RAG is essentially a Retrieve -> Read approach, which focuses on retrieving information and comprehending it.
Advanced RAG is added to the Retrieve -> Read approach by adding it into a Rewrite and Rerank components to improve relevance and groundedness.
Modular RAG takes everything a notch ahead by providing flexibility and adding modules like Search, Routing, etc.

Naive, Advanced & Modular RAGs are not exclusive approaches but a progression. Naive RAG is a special case of Advanced which, in turn, is a special case of Modular RAG

Some RAG Modules

Search

The search module is aimed at performing search on different data sources. It is customised to different data sources and aimed at increasing the source data for better response generation

Memory

This module leverages the parametric memory capabilities of the Language Model (LLM) to guide retrieval. The module may use a retrieval-enhanced generator to create an unbounded memory pool iteratively, combining the “original question” and “dual question.” By employing a retrieval-enhanced generative model that improves itself using its own outputs, the text becomes more aligned with the data distribution during the reasoning process.

Fusion

RAG-Fusion improves traditional search systems by overcoming their limitations through a multi-query approach. It expands user queries into multiple diverse perspectives using a Language Model (LLM). This strategy goes beyond capturing explicit information and delves into uncovering deeper, transformative knowledge. The fusion process involves conducting parallel vector searches for both the original and expanded queries, intelligently re-ranking to optimize results, and pairing the best outcomes with new queries.

Extra Generation

Rather than directly fetching information from a data source, this module employs the Language Model (LLM) to generate the required context. The content produced by the LLM is more likely to contain pertinent information, addressing issues related to repetition and irrelevant details in the retrieved content.

Task Adaptable Module

This module makes RAG adaptable to various downstream tasks allowing the development of task-specific end-to-end retrievers with minimal examples, demonstrating flexibility in handling different tasks.

In Conclusion

The evolution of Retrieval Augmented Generation (RAG) systems reflects a remarkable journey in addressing the challenges faced by Large Language Models (LLMs) in real-world applications. Beginning with the Naive RAG, a simple Retrieve -> Read approach, the field has progressed into more sophisticated Advanced RAG and ultimately the state-of-the-art Modular RAG. Naive RAG highlighted issues such as low precision and generation quality, leading to the development of advanced strategies in Pre-retrieval, Retrieval, and Post Retrieval stages. The incorporation of techniques like Chunk Optimization, Query Rewriting, and Adaptive Retrieval showcased the dedication to overcoming these challenges. Advanced RAG introduced iterative and recursive retrieval, Hypothetical Document Embeddings (HyDE), and Fine-tuned Embeddings to refine the process further. The latest Modular RAG approach embraces a configurable architecture, allowing components like search, memory, and reranking modules to be tailored for specific needs. Notably, the modules such as Search, Memory, Fusion, Extra Generation, and Task Adaptable Module contribute to the adaptability and efficiency of the RAG framework. This progression from Naive to Advanced and Modular RAG demonstrates a continuous commitment to enhancing the relevance, groundedness, and precision of LLM responses, marking Retrieval Augmented Generation as an indispensable technique in the realm of Large Language Models.

If you’re interested in RAG and Generative AI, in general, please download my notes on RAG and LLMs below

Retrieval Augmented Generation – A Simple Introduction

How to make a ChatGPT or a Bard for your own dataU+2753 The answer is in creating an organisation "knowledge brain" and use…

abhinavkimothi.gumroad.com

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Progression of Retrieval Augmented Generation (RAG) Systems

Author(s): Abhinav Kimothi

Progression of Retrieval Augmented Generation (RAG) Systems

Naive RAG

Challenges in Naive RAG

Advanced RAG

Advanced RAG Concepts

– Pre-retrieval/Retrieval Stage

– Post Retrieval Stage

Modular RAG

Some RAG Modules

In Conclusion

Retrieval Augmented Generation – A Simple Introduction

How to make a ChatGPT or a Bard for your own dataU+2753 The answer is in creating an organisation "knowledge brain" and use…

Generative AI with Large Language Models (Coursera Course Notes)

Generative AI with Large Language ModelsThe arrival of the transformers architecture in 2017, following the publication…

Don't miss a post!

Subscribe!

Getting the Most from LLMs: Building a Knowledge Brain for Retrieval Augmented Generation

The advancements in the LLM space have been mind-boggling. However, when it comes to using LLMs in real scenarios, we…

RAG Value Chain: Retrieval Strategies in Information Augmentation for Large Language Models

Perhaps, the most critical step in the entire RAG value chain is searching and retrieving the relevant pieces of…

Gradient Descent and the Melody of Optimization Algorithms

If you work in the field of artificial intelligence, Gradient Descent is one of the first terms you’ll hear. It is the…

Context is Key: The Significance of RAG in Language Models

30th November, 2022 will be remembered as the watershed moment in artificial intelligence. OpenAI released ChatGPT and…

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement