Beyond Training Data: How RAG Lets LLMs Retrieve, Not Guess

Author(s): DarkBones

Originally published on Towards AI.

Source: Image by the author generated with Flux.

Large Language Models (LLMs) like GPT-4 don’t actually ‘know’ anything, they predict words based on old training data. Retrieval-Augmented Generation (RAG) changes that by letting AI pull in fresh, real-world knowledge before answering.

RAG enhances LLMs by enabling them to retrieve relevant information from external sources before generating a response. Because LLMs rely on static training data and don’t update automatically, RAG gives them access to fresh, domain-specific, or private knowledge, without the need for costly retraining.

Let’s explore how RAG works, why it is useful, and how it differs from traditional LLM prompting.

What is Retrieval-Augmented Generation (RAG) in AI?

Retrieval-Augmented Generation (RAG) helps AI models retrieve external information before generating a response. But how exactly does this process work, and why is it important?

Large Language Models excel at many tasks. They can code, draft emails, hallucinate ingredients for the perfect sandwich, and even write articles, although I still prefer doing that myself. However, they have a major limitation. They lack real-time knowledge. Because training LLMs is a time-consuming process, they do not “know” about recent events. If you ask one about last week, it will either display a disclaimer, provide an outdated answer, or generate something completely inaccurate.

“Some LLMs overcome their biggest limitation of stale training data by retrieving up-to-date information before responding.”

RAG fetches relevant information before generating an answer, making AI responses more accurate and reducing hallucinations.

RAG Explained in Simple Terms

But how does RAG actually work? Instead of looking it up ourselves, let’s ask our favorite LLM:

This is not quite what we were hoping for. No problem, we can ask Bob instead.

Surprisingly, Bob did not know the answer either, but he was able to retrieve it. Here is what happened:

We asked Bob about RAG.
Bob went to the library and asked the librarian for information.
The librarian pointed him to the right aisle.
Bob retrieved the information.
Bob augmented his understanding by consuming the information before generating an answer.
Now Bob sounds like an expert. Thanks, Bob.

This breakdown reveals that Bob is effectively functioning as a RAG agent.

With that insight, let’s explore exactly how a RAG agent operates.

RAG — Simplified

Let’s transform our interaction with Bob into an actual RAG system:

Bob represents the RAG system.
The librarian acts as an embedder.
The library functions as a vector database.

“Rather than prompting an LLM directly, a RAG system acts as a knowledge bridge: retrieving, augmenting, and then generating responses.”

Vectorizing the Input

The RAG system then forwards the prompt to the embedder, which converts it into a vector. This vector is a numeric representation of the prompt. The idea is that information with similar meaning will have similar vector representations.

“Vectors unlock relevance. This vector allows the system to retrieve the most meaningful information from the vector database.”

When the vector representation of the user’s prompt is sent to the database, it retrieves the most relevant matches.

The RAG system then enhances the user’s prompt by including the retrieved information:

<context>
the information returned from the database
</context>
<user-prompt>
the user's original prompt
</user-prompt>

That is the entire process. Retrieve, Augment, and Generate. RAG.

Adding to the Knowledge Base

However, the system cannot retrieve information that has not been added to the database. How do we store new data? The process is straightforward. Instead of using the vector to find relevant information, the system stores the data along with its vector representation.

If you were only interested in the big picture, congratulations. You now understand the core concept. However, if you’re a fellow neckbeard, let’s talk a bit more about vectors and embedders.

What is a Vector?

In simple terms, a vector is a set of coordinates that describe how to move from A to B. Look at this graph:

Source: Still frame from “Photograph” music video by Nickelback, edited with a custom graph overlay by the author.

This graph has two dimensions. Each point, A, B, C, and D, can be described using a two-number coordinate system. The first number tells us how far to move to the right from the origin (0), while the second number tells us how far to move up. To reach A, the vector is [3, 7]. To reach D, the vector is [3, 0].

Dimensionality of Vectors

The same principle applies in three dimensions. To move from your desk to the coffee machine, you must travel a certain distance along the x, y, and z axes, forming a three-digit coordinate system.

“Humans struggle to visualize beyond three dimensions. Computers thrive in multi-dimensional spaces.”

The math remains the same. Four dimensions? That requires a four-digit coordinate system. One hundred dimensions? That requires a 100-digit coordinate system.

Source: Meme remix combining “This is Fine” by KC Green (original) with custom artwork by the author.

“The embedder I use operates in a mind-bending, 768-dimensional coordinate system, far beyond human perception.”

When you have finished trying to visualize that, we can return to simpler, easy-to-draw, two-dimensional graphs.

How Vector Embeddings Help LLMs Retrieve Data

Vectors by themselves are simply n-dimensional coordinates that represent points in n-dimensional space.

“Vectors aren’t just numbers, they encode meaning. Their true power lies in the information they represent.”

In the same way, vectors are coordinates not to places, but to information. A specialized LLM, an embedder, is trained on a large corpus of text to figure out similarities and to place these pieces of information somewhere in n-dimensional space such that similar topics tend to be grouped together.

Like, when you go to a social event, you’re likely to stick with your friends, colleagues, or at least a group of like-minded people.

Grouping Similar Concepts Together

This graph shows how words that are similar in meaning tend to get grouped together in this n-dimensional space. Modern embedders (like BERT) don’t use single-word embeddings anymore, but generate contextual embeddings.

The ability to group similar concepts in vector space makes embeddings powerful. However, early embedding models like Word2Vec had a significant limitation that modern models have addressed.

Quick Tech Tangent

If you’ve been working on AI systems for as long as I have, you might be familiar with Word2Vec. While groundbreaking when it came out in 2013, it has a major flaw: it assigns a single vector to each word, no matter the context.

Take the word “bat”.

Are we talking about the flying mammal? Then it should be near “mammal”, “cave”, and “nocturnal”.
Or do we mean a baseball bat? Then it belongs near “ball”, “pitch”, and “base” (but what base? Military?)
And what if we’re in the world of fiction? Then “bat” relates to “vampire” and “transformation”.

Word2Vec can’t tell the difference. It picks one and sticks with it.

One thing I find particularly fascinating with Word2Vec is that, since words are now represented by numbers, you can actually do arithmetic on them.

You can make equations like

“king - man + woman = queen – a legendary example of how AI models map relationships in vector space."

It’s wild, but it works (most of the time).

Tangent over.

How are Vectors Used?

Now that we understand vectors, the next step is straightforward. We embed the information we want the LLM to access, and when we ask a question about that information, the question itself should be close to the relevant content in vector space. The vector database retrieves the n most relevant pieces of content, where n is a configurable number.

It also returns the cosine similarity score for each result, indicating how closely the retrieved content matches the query.

Cosine Similarity

“Cosine similarity doesn’t just compare numbers, it measures meaning by calculating the angle between two vectors.”

A smaller angle indicates greater similarity, meaning the retrieved data is more relevant to the prompt.

In our example, A and B represent the phrases “RAG stands for Retrieval Augmented Generation” and “Hey LLM, tell me about RAG”. Since they are closely related, their vectors are similar. If we instead ask “Describe an Eclipse”, its vector will be far from the others, making it unrelated. However, if “RAG stands for Retrieval Augmented Generation” is the only entry in the database, it will still be retrieved, even if it is not relevant to the query.

Limitations of RAG

Typically, we do not store and retrieve entire documents in the vector database. If we did, a single large document could easily exceed the context window of the LLM. If the system is configured to return the ten most relevant pieces of information, and each of them is the size of a full article, your computer quickly turns into a space heater. To prevent this, we split the information into chunks of a predefined size, such as 1000 characters, and we try to keep the sentences and paragraphs intact.

However, splitting information into chunks introduces a new problem. Just as Word2Vec struggles to determine meaning from a single word, RAG often fails to understand the full context of a single chunk, especially when that chunk is extracted from the middle of a document.

Here is a problem I encountered recently. I keep a detailed work diary where I document all my professional achievements. It is extremely useful during performance reviews. However, when I ask my RAG system what I achieved at my current company, it confidently includes accomplishments from my previous jobs. Because I write this diary in the first person and also include information from other sources written in the first person, the system cannot distinguish between them. As a result, it starts attributing achievements to me that I had nothing to do with. That is how I realized something was wrong. My system was suddenly telling me about all the interesting things I supposedly did away from the computer, which is impossible since I never leave my desk.

Conclusion

RAG makes LLMs more useful by letting them retrieve information they wouldn’t otherwise have access to. But it’s not magic. It comes with its own challenges, from handling context properly to avoiding irrelevant results.

But as I learned firsthand, fetching information isn’t the same as understanding it. That’s why making RAG systems context-aware is the next big challenge, one I’ll tackle in my next article.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Beyond Training Data: How RAG Lets LLMs Retrieve, Not Guess

Author(s): DarkBones

What is Retrieval-Augmented Generation (RAG) in AI?

RAG Explained in Simple Terms

RAG — Simplified

Vectorizing the Input

Adding to the Knowledge Base

What is a Vector?

Dimensionality of Vectors

How Vector Embeddings Help LLMs Retrieve Data

Grouping Similar Concepts Together

Quick Tech Tangent

How are Vectors Used?

Cosine Similarity

Limitations of RAG

Conclusion

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Beyond Training Data: How RAG Lets LLMs Retrieve, Not Guess

The Rise of Diffusion LLMs

Top 25 AI-Related Highlights from the WEF Future of Jobs 2025 Report

Mastering the Basics: How Decision Trees Simplify Complex Choices

Rethinking Imbalance: LLM Embeddings for Detecting Subtle Irregularities

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Beyond Training Data: How RAG Lets LLMs Retrieve, Not Guess

Author(s): DarkBones

What is Retrieval-Augmented Generation (RAG) in AI?

RAG Explained in Simple Terms

RAG — Simplified

Vectorizing the Input

Adding to the Knowledge Base

What is a Vector?

Dimensionality of Vectors

How Vector Embeddings Help LLMs Retrieve Data

Grouping Similar Concepts Together

Quick Tech Tangent

How are Vectors Used?

Cosine Similarity

Limitations of RAG

Conclusion

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement