Improve AI Response Rate with Intelligent Data Retrieval

Last Updated on September 18, 2024 by Editorial Team

Author(s): Muffaddal Qutbuddin

Originally published on Towards AI.

You are working on an app built on top of LLM models. The app works great and produces quality output. However, you are not satisfied with the time it takes to respond.

In this article, I will discuss a technique to greatly improve your time to respond, resulting in better performance for the AI application.

What is a RAG in AI?

AI applications greatly depend on the RAG to retrieve the data and answer the user query accordingly. In the RAG system users' queries are answered by LLMs using the data provided to them by our system.

So, instead of answering the question on its own, LLM employs the data it is provided when the user answers the query.

The key elements to the success of RAG applications depend on two key factors. One is the data we store and how we search that data to pass to LLM to answer user queries.

At a high level, you tranform the data in an embedding and store as vector format into a vector db such as Pinecone. The user query is also converted into embedding and compared to our vector formatted data to extract the related information. This relevant information is passed to LLM with instructions to only answer user query from this relevant data.

For example, let’s say you have several documents on movies plot. You convert the document into embeddings and store it in vector db. Let’s say a user asks the question about Deadpool 3, “Why did Deadpool kill Chris Evans in the movie? We would do a relevancy search to get the Deadpool plot and pass that to LLM. As you can guess the AI model would respond to user question.

Simple right?

What is the problem with RAG?

Continuing on the above scenario, what would the next question of the user be? It would be most likely around the Deadpool movie and maybe around the Chris Evans death scene. Agree?

The problem with a typical RAG system is that for each question it would do the relevancy search and fetch the data. In the real-world scenario, users tend to ask questions on the same topic. So no point in fetching the same data again and again. Instead, cache the fetched data and answer the user query from that cached data. This results in first reducing the cost and increases the time to answer hence the best user experience.

But how do we know if the cached data is enough to answer user queries? The solution is simple. Add an LLM layer that does just that. An AI model that asses if new data is required or current data will suffice. Any basic AL model can achieve that.

Let’s implement the above using Langchain and Python.

Cache RAG Data to Improve Response Rate

We would build an AI app that will generate insights from data against user queries.

Here is how the app functions and how it leverages AI to utilize data effectively to produce insights.

AI Application Architecture, by Muffaddal Qutbuddin

At a high level, the process starts when a user poses a question. The AI evaluates this question to determine the necessary data requirements. It then generates the appropriate SQL query which is passed to an API, fetching the required data from the database. The fetched data is subsequently analyzed by the AI agent to produce insights.

The key layer is the New data requirement layer. It compares user queries and data it fetched in the previous iteration to decide if new data is required.

Since on the first interaction of the user, this cache layer would signal to fetch the data as there is no data to start with. But for following user queries it would evaluate and won’t fetch new data if not required. This drastically reduces the time to respond for the user.

Here is how to implement using Langchain and Python

def is_new_data_required(user_query,data,llm):
 print("Checking if new data is required to pull from bigquery \n")
 df_info = data.to_markdown(index=False)
 prompt = f"""This is the users query that I need to answer using the data in dataframe
 {user_query}

 Your task is to decide if the data I have contains the required information or do I need to fetch from bigquery. Say "fetch" if I need the new data and say "no fetch" if data I have can be used to answer the user query
 data I have is as follows
 {df_info}
 """
 is_fetch = llm.invoke(prompt)
 return is_fetch.content

Simple right? AI to the rescue.

let’s see how AI behaves with this new layer.

For the question “What is the revenue of the top 10 selling products in the last 3 months?” the below image shows the entire end-to-end steps of our custom AI app.

Analysis of data using AI, Muffaddal Qutbuddin

Let’s see if it fetches new data or uses the same dataset when I ask “What’s the name of the top-selling product”

It provided accurate results without the need to pull new data.

Final Thoughts

Advances in AI have enabled us to build many new applications that weren’t possible in the past. What’s important is to build AI systems that are intelligent and yield output efficiently.

Adding a small data evaluation layer drastically improves the response time of the LLM models and reduces the cost of fetching the data every time the user asks a question.

Of course, the data check layer has its own cost. However, it is far less than the impact the user would have with a high response time.

Whether to add this layer or not depends on the AI application use case. So better evaluate all the metrics before designing the system for your new AI app.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Improve AI Response Rate with Intelligent Data Retrieval

Author(s): Muffaddal Qutbuddin

What is a RAG in AI?

What is the problem with RAG?

Cache RAG Data to Improve Response Rate

Final Thoughts

Similar Reads

Effectively Analyze Survey Responses with AI

How to use AI to analyze survey responses and get actionable insights

Build a Recommendation System using Google Cloud Recommendation AI

Implement a highly advanced recommender system using Google Cloud Recommendation AI

RFM Analysis using BigQuery ML

User Segmentation using RFM analysis in BigQuery ML and Visualization in the Data Studio.

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

LAI #66: Information Theory for People in a Hurry

🔎 Decoding LLM Pipeline — Step 1: Input Processing & Tokenization

Meta to Launch Its Own In-House AI Chip

I Built an AI Money Coach in Python — Here’s How You Can Too (Step-by-Step Guide!)

ChatGPT Now Works Natively in Xcode and VS Code

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Improve AI Response Rate with Intelligent Data Retrieval

Author(s): Muffaddal Qutbuddin

What is a RAG in AI?

What is the problem with RAG?

Cache RAG Data to Improve Response Rate

Final Thoughts

Similar Reads

Effectively Analyze Survey Responses with AI

How to use AI to analyze survey responses and get actionable insights

Build a Recommendation System using Google Cloud Recommendation AI

Implement a highly advanced recommender system using Google Cloud Recommendation AI

RFM Analysis using BigQuery ML

User Segmentation using RFM analysis in BigQuery ML and Visualization in the Data Studio.

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement