Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Beyond Training Data: How RAG Lets LLMs Retrieve, Not Guess
Artificial Intelligence   Latest   Machine Learning

Beyond Training Data: How RAG Lets LLMs Retrieve, Not Guess

Author(s): DarkBones

Originally published on Towards AI.

Source: Image by the author generated with Flux.

Large Language Models (LLMs) like GPT-4 don’t actually β€˜know’ anything, they predict words based on old training data. Retrieval-Augmented Generation (RAG) changes that by letting AI pull in fresh, real-world knowledge before answering.

RAG enhances LLMs by enabling them to retrieve relevant information from external sources before generating a response. Because LLMs rely on static training data and don’t update automatically, RAG gives them access to fresh, domain-specific, or private knowledge, without the need for costly retraining.

Let’s explore how RAG works, why it is useful, and how it differs from traditional LLM prompting.

What is Retrieval-Augmented Generation (RAG) in AI?

Retrieval-Augmented Generation (RAG) helps AI models retrieve external information before generating a response. But how exactly does this process work, and why is it important?

Large Language Models excel at many tasks. They can code, draft emails, hallucinate ingredients for the perfect sandwich, and even write articles, although I still prefer doing that myself. However, they have a major limitation. They lack real-time knowledge. Because training LLMs is a time-consuming process, they do not β€œknow” about recent events. If you ask one about last week, it will either display a disclaimer, provide an outdated answer, or generate something completely inaccurate.

β€œSome LLMs overcome their biggest limitation of stale training data by retrieving up-to-date information before responding.”

RAG fetches relevant information before generating an answer, making AI responses more accurate and reducing hallucinations.

RAG Explained in Simple Terms

But how does RAG actually work? Instead of looking it up ourselves, let’s ask our favorite LLM:

Source: Image by the author.

This is not quite what we were hoping for. No problem, we can ask Bob instead.

Source: Image by the author.

Surprisingly, Bob did not know the answer either, but he was able to retrieve it. Here is what happened:

  1. We asked Bob about RAG.
  2. Bob went to the library and asked the librarian for information.
  3. The librarian pointed him to the right aisle.
  4. Bob retrieved the information.
  5. Bob augmented his understanding by consuming the information before generating an answer.
  6. Now Bob sounds like an expert. Thanks, Bob.

This breakdown reveals that Bob is effectively functioning as a RAG agent.

With that insight, let’s explore exactly how a RAG agent operates.

RAG β€” Simplified

Let’s transform our interaction with Bob into an actual RAG system:

  • Bob represents the RAG system.
  • The librarian acts as an embedder.
  • The library functions as a vector database.
Source: Image by the author.

β€œRather than prompting an LLM directly, a RAG system acts as a knowledge bridge: retrieving, augmenting, and then generating responses.”

Vectorizing the Input

The RAG system then forwards the prompt to the embedder, which converts it into a vector. This vector is a numeric representation of the prompt. The idea is that information with similar meaning will have similar vector representations.

β€œVectors unlock relevance. This vector allows the system to retrieve the most meaningful information from the vector database.”

When the vector representation of the user’s prompt is sent to the database, it retrieves the most relevant matches.

The RAG system then enhances the user’s prompt by including the retrieved information:

<context>
the information returned from the database
</context>
<user-prompt>
the user's original prompt
</user-prompt>

That is the entire process. Retrieve, Augment, and Generate. RAG.

Adding to the Knowledge Base

However, the system cannot retrieve information that has not been added to the database. How do we store new data? The process is straightforward. Instead of using the vector to find relevant information, the system stores the data along with its vector representation.

Source: Image by the author.

If you were only interested in the big picture, congratulations. You now understand the core concept. However, if you’re a fellow neckbeard, let’s talk a bit more about vectors and embedders.

What is a Vector?

In simple terms, a vector is a set of coordinates that describe how to move from A to B. Look at this graph:

Source: Still frame from β€œPhotograph” music video by Nickelback, edited with a custom graph overlay by the author.

This graph has two dimensions. Each point, A, B, C, and D, can be described using a two-number coordinate system. The first number tells us how far to move to the right from the origin (0), while the second number tells us how far to move up. To reach A, the vector is [3, 7]. To reach D, the vector is [3, 0].

Dimensionality of Vectors

The same principle applies in three dimensions. To move from your desk to the coffee machine, you must travel a certain distance along the x, y, and z axes, forming a three-digit coordinate system.

Source: Image by the author.

β€œHumans struggle to visualize beyond three dimensions. Computers thrive in multi-dimensional spaces.”

The math remains the same. Four dimensions? That requires a four-digit coordinate system. One hundred dimensions? That requires a 100-digit coordinate system.

Source: Meme remix combining β€œThis is Fine” by KC Green (original) with custom artwork by the author.

β€œThe embedder I use operates in a mind-bending, 768-dimensional coordinate system, far beyond human perception.”

When you have finished trying to visualize that, we can return to simpler, easy-to-draw, two-dimensional graphs.

How Vector Embeddings Help LLMs Retrieve Data

Vectors by themselves are simply n-dimensional coordinates that represent points in n-dimensional space.

β€œVectors aren’t just numbers, they encode meaning. Their true power lies in the information they represent.”

In the same way, vectors are coordinates not to places, but to information. A specialized LLM, an embedder, is trained on a large corpus of text to figure out similarities and to place these pieces of information somewhere in n-dimensional space such that similar topics tend to be grouped together.

Like, when you go to a social event, you’re likely to stick with your friends, colleagues, or at least a group of like-minded people.

Grouping Similar Concepts Together

Source: Image by the author.

This graph shows how words that are similar in meaning tend to get grouped together in this n-dimensional space. Modern embedders (like BERT) don’t use single-word embeddings anymore, but generate contextual embeddings.

The ability to group similar concepts in vector space makes embeddings powerful. However, early embedding models like Word2Vec had a significant limitation that modern models have addressed.

Quick Tech Tangent

If you’ve been working on AI systems for as long as I have, you might be familiar with Word2Vec. While groundbreaking when it came out in 2013, it has a major flaw: it assigns a single vector to each word, no matter the context.

Take the word β€œbat”.

  • Are we talking about the flying mammal? Then it should be near β€œmammal”, β€œcave”, and β€œnocturnal”.
  • Or do we mean a baseball bat? Then it belongs near β€œball”, β€œpitch”, and β€œbase” (but what base? Military?)
  • And what if we’re in the world of fiction? Then β€œbat” relates to β€œvampire” and β€œtransformation”.

Word2Vec can’t tell the difference. It picks one and sticks with it.

One thing I find particularly fascinating with Word2Vec is that, since words are now represented by numbers, you can actually do arithmetic on them.

You can make equations like

β€œking - man + woman = queen – a legendary example of how AI models map relationships in vector space."

It’s wild, but it works (most of the time).

Tangent over.

How are Vectors Used?

Now that we understand vectors, the next step is straightforward. We embed the information we want the LLM to access, and when we ask a question about that information, the question itself should be close to the relevant content in vector space. The vector database retrieves the n most relevant pieces of content, where n is a configurable number.

It also returns the cosine similarity score for each result, indicating how closely the retrieved content matches the query.

Cosine Similarity

β€œCosine similarity doesn’t just compare numbers, it measures meaning by calculating the angle between two vectors.”

A smaller angle indicates greater similarity, meaning the retrieved data is more relevant to the prompt.

Source: Image by the author.

In our example, A and B represent the phrases β€œRAG stands for Retrieval Augmented Generation” and β€œHey LLM, tell me about RAG”. Since they are closely related, their vectors are similar. If we instead ask β€œDescribe an Eclipse”, its vector will be far from the others, making it unrelated. However, if β€œRAG stands for Retrieval Augmented Generation” is the only entry in the database, it will still be retrieved, even if it is not relevant to the query.

Limitations of RAG

Typically, we do not store and retrieve entire documents in the vector database. If we did, a single large document could easily exceed the context window of the LLM. If the system is configured to return the ten most relevant pieces of information, and each of them is the size of a full article, your computer quickly turns into a space heater. To prevent this, we split the information into chunks of a predefined size, such as 1000 characters, and we try to keep the sentences and paragraphs intact.

However, splitting information into chunks introduces a new problem. Just as Word2Vec struggles to determine meaning from a single word, RAG often fails to understand the full context of a single chunk, especially when that chunk is extracted from the middle of a document.

Source: Image by the author.

Here is a problem I encountered recently. I keep a detailed work diary where I document all my professional achievements. It is extremely useful during performance reviews. However, when I ask my RAG system what I achieved at my current company, it confidently includes accomplishments from my previous jobs. Because I write this diary in the first person and also include information from other sources written in the first person, the system cannot distinguish between them. As a result, it starts attributing achievements to me that I had nothing to do with. That is how I realized something was wrong. My system was suddenly telling me about all the interesting things I supposedly did away from the computer, which is impossible since I never leave my desk.

Conclusion

RAG makes LLMs more useful by letting them retrieve information they wouldn’t otherwise have access to. But it’s not magic. It comes with its own challenges, from handling context properly to avoiding irrelevant results.

But as I learned firsthand, fetching information isn’t the same as understanding it. That’s why making RAG systems context-aware is the next big challenge, one I’ll tackle in my next article.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓