Improve AI Response Rate with Intelligent Data Retrieval
Last Updated on September 18, 2024 by Editorial Team
Author(s): Muffaddal Qutbuddin
Originally published on Towards AI.
You are working on an app built on top of LLM models. The app works great and produces quality output. However, you are not satisfied with the time it takes to respond.
In this article, I will discuss a technique to greatly improve your time to respond, resulting in better performance for the AI application.
What is a RAG in AI?
AI applications greatly depend on the RAG to retrieve the data and answer the user query accordingly. In the RAG system users' queries are answered by LLMs using the data provided to them by our system.
So, instead of answering the question on its own, LLM employs the data it is provided when the user answers the query.
The key elements to the success of RAG applications depend on two key factors. One is the data we store and how we search that data to pass to LLM to answer user queries.
At a high level, you tranform the data in an embedding and store as vector format into a vector db such as Pinecone. The user query is also converted into embedding and compared to our vector formatted data to extract the related information. This relevant information is passed to LLM with instructions to only answer user query from this relevant data.
For example, letβs say you have several documents on movies plot. You convert the document into embeddings and store it in vector db. Letβs say a user asks the question about Deadpool 3, βWhy did Deadpool kill Chris Evans in the movie? We would do a relevancy search to get the Deadpool plot and pass that to LLM. As you can guess the AI model would respond to user question.
Simple right?
What is the problem with RAG?
Continuing on the above scenario, what would the next question of the user be? It would be most likely around the Deadpool movie and maybe around the Chris Evans death scene. Agree?
The problem with a typical RAG system is that for each question it would do the relevancy search and fetch the data. In the real-world scenario, users tend to ask questions on the same topic. So no point in fetching the same data again and again. Instead, cache the fetched data and answer the user query from that cached data. This results in first reducing the cost and increases the time to answer hence the best user experience.
But how do we know if the cached data is enough to answer user queries? The solution is simple. Add an LLM layer that does just that. An AI model that asses if new data is required or current data will suffice. Any basic AL model can achieve that.
Letβs implement the above using Langchain and Python.
Cache RAG Data to Improve Response Rate
We would build an AI app that will generate insights from data against user queries.
Here is how the app functions and how it leverages AI to utilize data effectively to produce insights.
At a high level, the process starts when a user poses a question. The AI evaluates this question to determine the necessary data requirements. It then generates the appropriate SQL query which is passed to an API, fetching the required data from the database. The fetched data is subsequently analyzed by the AI agent to produce insights.
The key layer is the New data requirement layer. It compares user queries and data it fetched in the previous iteration to decide if new data is required.
Since on the first interaction of the user, this cache layer would signal to fetch the data as there is no data to start with. But for following user queries it would evaluate and wonβt fetch new data if not required. This drastically reduces the time to respond for the user.
Here is how to implement using Langchain and Python
def is_new_data_required(user_query,data,llm):
print("Checking if new data is required to pull from bigquery \n")
df_info = data.to_markdown(index=False)
prompt = f"""This is the users query that I need to answer using the data in dataframe
{user_query}
Your task is to decide if the data I have contains the required information or do I need to fetch from bigquery. Say "fetch" if I need the new data and say "no fetch" if data I have can be used to answer the user query
data I have is as follows
{df_info}
"""
is_fetch = llm.invoke(prompt)
return is_fetch.content
Simple right? AI to the rescue.
letβs see how AI behaves with this new layer.
For the question βWhat is the revenue of the top 10 selling products in the last 3 months?β the below image shows the entire end-to-end steps of our custom AI app.
Letβs see if it fetches new data or uses the same dataset when I ask βWhatβs the name of the top-selling productβ
It provided accurate results without the need to pull new data.
Final Thoughts
Advances in AI have enabled us to build many new applications that werenβt possible in the past. Whatβs important is to build AI systems that are intelligent and yield output efficiently.
Adding a small data evaluation layer drastically improves the response time of the LLM models and reduces the cost of fetching the data every time the user asks a question.
Of course, the data check layer has its own cost. However, it is far less than the impact the user would have with a high response time.
Whether to add this layer or not depends on the AI application use case. So better evaluate all the metrics before designing the system for your new AI app.
Similar Reads
Effectively Analyze Survey Responses with AI
How to use AI to analyze survey responses and get actionable insights
pub.towardsai.net
Build a Recommendation System using Google Cloud Recommendation AI
Implement a highly advanced recommender system using Google Cloud Recommendation AI
towardsdatascience.com
RFM Analysis using BigQuery ML
User Segmentation using RFM analysis in BigQuery ML and Visualization in the Data Studio.
towardsdatascience.com
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI