Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take the GenAI Test: 25 Questions, 6 Topics. Free from Activeloop & Towards AI

Publication

Retrieval Interleaved Generation (RIG): When real-time data retrieval meets response generation
Latest   Machine Learning

Retrieval Interleaved Generation (RIG): When real-time data retrieval meets response generation

Last Updated on September 17, 2024 by Editorial Team

Author(s): Rupali Patil

Originally published on Towards AI.

Imagine you are a financial analyst trying to generate a detailed comparison between the current GDP of France and the GDP of Italy. You submit a query to a language model asking,

β€œWhat are the current GDP figures of France and Italy, and how have they changed over the last five years?”

Using Retrieval-Augmented Generation (RAG), the model performs an initial retrieval step, fetching relevant data from external databases or knowledge sources. After gathering this information, the model generates a response, such as:

β€œThe current GDP of France is approximately $2.9 trillion, while Italy’s GDP is around $2.1 trillion. Over the last five years, France’s GDP has grown by an average of 1.5%, and Italy’s GDP has remained relatively stagnant with a growth rate of 0.6%.”

In this scenario, RAG has effectively improved the model’s response by grounding it in real-world data through its one-time retrieval process before generating the final output.

However, this process can be limited if more complex, evolving queries are required, especially if multiple pieces of real-time information need to be retrieved dynamically.

Enter Retrieval Interleaved Generation (RIG)!!!

Now imagine you ask an even more complex query:

β€œWhat are the GDP growth rates of France and Italy in the past five years, and how do these compare to their employment rates over the same period?”

With RIG, the model begins by generating a partial response based on what it knows internally about GDP figures. However, instead of relying on a single retrieval step, it interleaves the retrieval of employment rate data while continuing to generate text. For instance, as the model provides the initial GDP data, it simultaneously fetches employment statistics and updates its response in real time:

β€œThe current GDP of France is $2.9 trillion, while Italy’s GDP is $2.1 trillion. Over the past five years, France’s GDP grew at an average rate of 1.5%, and Italy’s at 0.6%. During this period, France’s employment rate increased by 2%, while Italy saw only a slight improvement of 0.5%.”

So what just happened?

RIG enhanced the response by continuously fetching relevant data as it generated the text, offering a more comprehensive and accurate response for complex, multi-faceted queries. This dynamic interleaving ensures that every piece of information is up-to-date and factually correct, especially in real-time data scenarios, providing more precise insights for decision-making.

Let’s learn a bit more about RIG!

But First…What is Interleaving?

Before we get into RIG, it’s essential to understand the concept of interleaving.

Interleaving is a technique used in various fields like computing, scheduling, and data retrieval, where multiple tasks, processes, or data streams are alternated or combined in a way that allows them to progress simultaneously or in parallel without completing each one sequentially.

In simple words, it is about mixing different operations rather than completing one before starting another.

Interleaving Practice. Image Source

In the context of Retrieval Interleaved Generation (RIG), interleaving refers to alternating between generating a partial response and retrieving external data. The model doesn’t wait for all data to be retrieved before generating its response; instead, it interleaves the two tasks, so retrieval and response generation happen side by side.

What is Retrieval Interleaved Generation (RIG)?

Retrieval Interleaved Generation (RIG) is an advanced technique in natural language processing (NLP) where real-time data retrieval is dynamically combined with the generation of responses by large language models (LLMs). Instead of a linear process where external data is retrieved before generating a response (as in Retrieval-Augmented Generation, RAG), RIG integrates retrieval into the generation process itself. It allows the LLM to continuously query external data sources while it generates partial responses, iterating between retrieval and generation.

RAG vs. RIG in action

How does RIG work?

RIG’s process can be broken down into several key steps:

1️⃣ User Query Submission: The user submits a query or prompt to the LLM, just as they would in a traditional language model interaction.

2️⃣ Partial Response Generation: The LLM starts generating a response based on the internal knowledge it already has. This response, however, may include placeholders or speculative answers for parts that require external data.

3️⃣ Real-Time Data Retrieval: As the LLM identifies missing or incomplete information, it queries external sources in real-time (e.g., databases, knowledge graphs, or web-based APIs). The model can make multiple retrieval calls, enriching the response with newly acquired data.

4️⃣ Interleaving of Retrieval and Generation: During the generation of the response, the model dynamically alternates between generating parts of the response and fetching data as needed. For instance, if the LLM begins by saying, β€œThe population of California is approximately…,” it pauses to retrieve the specific population figure before completing that segment of the response.

5️⃣ Final Response: Once all necessary data has been retrieved and incorporated, the LLM finalizes the response and returns it to the user.

Example

Imagine a query like: β€œWhat is the current GDP of the United States, and how does it compare to China’s?”

  • Initial Generation: The LLM may begin with, β€œThe GDP of the United States is around…”
  • Retrieval: The LLM then queries trusted data sources (like the World Bank or IMF databases) to retrieve the latest GDP figures.
  • Interleaving: As the LLM generates the comparison part of the query, it retrieves real-time data for China’s GDP to ensure an accurate comparison.
  • Final Response: After completing the retrieval process for both countries, the LLM finalizes the complete, data-grounded response.

Recent study on RIG: Google AI’s introduction of DataGemma

In Sept 2024, Google introduced DataGemma, a revolutionary solution designed to address hallucinations in LLMs. DataGemma seeks to combat this by anchoring LLM outputs in real-world statistical data using the vast resources of Google’s Data Commons. By grounding responses in verified and trusted data sources, DataGemma aims to enhance the accuracy and dependability of AI-generated content, making it more suitable for high-stakes applications.

β€œData Commons is a publicly available knowledge graph containing over 240 billion rich data points across hundreds of thousands of statistical variables. It sources this public information from trusted organizations like the United Nations (UN), the World Health Organization (WHO), the Centers for Disease Control and Prevention (CDC), and the Census Bureau. Combining these datasets into one unified set of tools and AI models empowers policymakers, researchers, and organizations seeking accurate insights.” β€” Google, 2024

Google has introduced two advanced variants specifically designed to enhance the capabilities of large language models: DataGemma-RAG-27B-IT and DataGemma-RIG-27B-IT. These models represent the latest advancements in Retrieval Augmented Generation (RAG) and Retrieval Interleaved Generation (RIG).

  • The RAG-27B-IT model taps into Google’s vast Data Commons, enabling it to incorporate rich, context-driven information into its outputs, making it ideal for tasks requiring deep comprehension and in-depth analysis of complex data sets.
  • Meanwhile, the RIG-27B-IT model focuses on real-time data retrieval from trusted sources to dynamically fact-check and validate statistical information during response generation, ensuring high accuracy.

Both models are tailored for tasks demanding precision and reasoning, making them particularly suited for research, policymaking, and business analytics applications.

Image source

Although the RIG and RAG methodologies are still in their early stages, preliminary research suggests promising improvements in the accuracy of and reduction of hallucinations at LLMs when handling numerical facts.

Why use RIG?

1️⃣ Reducing Hallucination in LLMs (even more than RAG)

By interleaving real-time data retrieval with the generation process, RIG continuously queries trusted data sources as it forms a response. It helps the model ground its output in real-world, factual information, significantly reducing the risk of hallucination.

For example, when asked about specific statistics or real-time data (e.g., β€œWhat is the GDP of Brazil in 2023?”), RIG ensures the response is based on current data retrieved from reliable databases, reducing reliance on outdated or incomplete information stored internally.

2️⃣ Improved Accuracy

One of the significant benefits of RIG is its ability to provide more accurate responses for data-dependent queries. In traditional LLMs, the model can only generate answers based on what it has learned during its pre-training phase. If the internal knowledge is outdated, the model might generate incorrect answers. RIG solves this by fetching real-time data during the generation process, ensuring the accuracy of the information it provides.

For example, in a finance-related query like β€œWhat are the current interest rates for a 10-year bond?”, RIG would retrieve the latest data from financial databases in real-time before finalizing the response, thus providing an accurate and up-to-date answer.

3️⃣Real-Time Adaptation

A significant advantage of RIG is its ability to adapt in real time while generating responses. Unlike Retrieval-Augmented Generation (RAG), where data is retrieved only once before the response is generated, RIG dynamically interleaves the retrieval process during response generation. If the LLM encounters multiple pieces of missing or incomplete information, it can iteratively fetch the data while refining its response.

For instance, a user asks a complex multi-part question like β€œWhat is the GDP of France, and how have recent economic policies impacted it?”. RIG can first retrieve the GDP data, start generating the response, then dynamically retrieve information on the economic policies and integrate that into the final output.

Real-world applications of RIG

RIG is highly versatile, making it ideal for handling complex, real-time, or evolving queries that require multiple data sources. RIG is suited for sectors where the information is constantly changing and evolving. RIG’s ability to interleave retrieval and generation is well-suited for situations where precision is essential.

RIG is handy in domains like:

  • Healthcare: Fetching real-time patient data, clinical trial results, and the latest medical studies to provide accurate medical insights.
  • Finance: Retrieving real-time stock prices, interest rates, or economic indicators to provide timely financial insights.
  • Scientific Research: Providing the latest research findings and dynamically adjusting the response as more data is gathered.
  • Customer Support: Offering accurate responses based on real-time product or policy information.

Challenges and limitations of RIG

RIG holds a promising future. However, it is limited by a few challenges, including:

  • Latency: Discussion of increased response time due to continuous retrieval.
  • Resource Intensity: How RIG’s interleaving process can demand more computational resources.
  • Data Dependency: Challenges when external data sources are slow, unreliable, or incomplete.
  • Implementation Complexity: The need for more sophisticated architecture and design.

Future of RIG

The future of Retrieval Interleaved Generation (RIG) holds exciting potential for further AI research and development. As real-time data becomes increasingly vital for various industries, RIG is expected to evolve and integrate better with real-time databases.

As AI models increasingly interact with real-world data, we might see RIG integrated into autonomous agents, enabling these systems to respond adaptively in environments that require real-time decision-making and data-driven insights. Autonomous systems in logistics, robotics, or customer service could greatly benefit from this adaptive retrieval process, offering highly contextual and accurate responses on the fly.

Reference Links:

Google’s DataGemma: https://blog.google/technology/ai/google-datagemma-ai-llm/

DataGemma and RIG Paper: https://docs.datacommons.org/papers/DataGemma-FullPaper.pdf

RAG Gemma at huggingface: https://huggingface.co/google/datagemma-rag-27b-it

RIG Gemma at huggingface: https://huggingface.co/google/datagemma-rig-27b-it

💚 Thank you for taking the time to read this far!

⏳ I create curated content, combining extensive reading and personal experiences, to share insights that I hope you find valuable..

🎯 My mission: to impact, influence, and ignite ideas through AI, innovation, product, and strategy.

👏 If you found this helpful, please show some love by leaving claps!

💬 I’d love to hear your thoughts and learn from you, so feel free to share your comments below!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓