Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take the GenAI Test: 25 Questions, 6 Topics. Free from Activeloop & Towards AI

Publication

Improve AI Response Rate with Intelligent Data Retrieval
Latest   Machine Learning

Improve AI Response Rate with Intelligent Data Retrieval

Last Updated on September 18, 2024 by Editorial Team

Author(s): Muffaddal Qutbuddin

Originally published on Towards AI.

Image by Angela from Pixabay

You are working on an app built on top of LLM models. The app works great and produces quality output. However, you are not satisfied with the time it takes to respond.

In this article, I will discuss a technique to greatly improve your time to respond, resulting in better performance for the AI application.

What is a RAG in AI?

AI applications greatly depend on the RAG to retrieve the data and answer the user query accordingly. In the RAG system users' queries are answered by LLMs using the data provided to them by our system.

So, instead of answering the question on its own, LLM employs the data it is provided when the user answers the query.

The key elements to the success of RAG applications depend on two key factors. One is the data we store and how we search that data to pass to LLM to answer user queries.

Source

At a high level, you tranform the data in an embedding and store as vector format into a vector db such as Pinecone. The user query is also converted into embedding and compared to our vector formatted data to extract the related information. This relevant information is passed to LLM with instructions to only answer user query from this relevant data.

For example, let’s say you have several documents on movies plot. You convert the document into embeddings and store it in vector db. Let’s say a user asks the question about Deadpool 3, β€œWhy did Deadpool kill Chris Evans in the movie? We would do a relevancy search to get the Deadpool plot and pass that to LLM. As you can guess the AI model would respond to user question.

Simple right?

What is the problem with RAG?

Continuing on the above scenario, what would the next question of the user be? It would be most likely around the Deadpool movie and maybe around the Chris Evans death scene. Agree?

The problem with a typical RAG system is that for each question it would do the relevancy search and fetch the data. In the real-world scenario, users tend to ask questions on the same topic. So no point in fetching the same data again and again. Instead, cache the fetched data and answer the user query from that cached data. This results in first reducing the cost and increases the time to answer hence the best user experience.

But how do we know if the cached data is enough to answer user queries? The solution is simple. Add an LLM layer that does just that. An AI model that asses if new data is required or current data will suffice. Any basic AL model can achieve that.

Let’s implement the above using Langchain and Python.

Cache RAG Data to Improve Response Rate

We would build an AI app that will generate insights from data against user queries.

Here is how the app functions and how it leverages AI to utilize data effectively to produce insights.

AI Application Architecture, by Muffaddal Qutbuddin

At a high level, the process starts when a user poses a question. The AI evaluates this question to determine the necessary data requirements. It then generates the appropriate SQL query which is passed to an API, fetching the required data from the database. The fetched data is subsequently analyzed by the AI agent to produce insights.

The key layer is the New data requirement layer. It compares user queries and data it fetched in the previous iteration to decide if new data is required.

Cache data to improve AI response

Since on the first interaction of the user, this cache layer would signal to fetch the data as there is no data to start with. But for following user queries it would evaluate and won’t fetch new data if not required. This drastically reduces the time to respond for the user.

Here is how to implement using Langchain and Python

def is_new_data_required(user_query,data,llm):
print("Checking if new data is required to pull from bigquery \n")
df_info = data.to_markdown(index=False)
prompt = f"""This is the users query that I need to answer using the data in dataframe
{user_query}

Your task is to decide if the data I have contains the required information or do I need to fetch from bigquery. Say "fetch" if I need the new data and say "no fetch" if data I have can be used to answer the user query
data I have is as follows
{df_info}
"""

is_fetch = llm.invoke(prompt)
return is_fetch.content

Simple right? AI to the rescue.

let’s see how AI behaves with this new layer.

For the question β€œWhat is the revenue of the top 10 selling products in the last 3 months?” the below image shows the entire end-to-end steps of our custom AI app.

Analysis of data using AI, Muffaddal Qutbuddin

Let’s see if it fetches new data or uses the same dataset when I ask β€œWhat’s the name of the top-selling product”

Analysis of data using AI, Muffaddal Qutbuddin

It provided accurate results without the need to pull new data.

Final Thoughts

Advances in AI have enabled us to build many new applications that weren’t possible in the past. What’s important is to build AI systems that are intelligent and yield output efficiently.

Adding a small data evaluation layer drastically improves the response time of the LLM models and reduces the cost of fetching the data every time the user asks a question.

Of course, the data check layer has its own cost. However, it is far less than the impact the user would have with a high response time.

Whether to add this layer or not depends on the AI application use case. So better evaluate all the metrics before designing the system for your new AI app.

Similar Reads

Effectively Analyze Survey Responses with AI

How to use AI to analyze survey responses and get actionable insights

pub.towardsai.net

Build a Recommendation System using Google Cloud Recommendation AI

Implement a highly advanced recommender system using Google Cloud Recommendation AI

towardsdatascience.com

RFM Analysis using BigQuery ML

User Segmentation using RFM analysis in BigQuery ML and Visualization in the Data Studio.

towardsdatascience.com

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓