LLM & AI Agent Applications with LangChain and LangGraph — Part 20: Retrieval-Augmented Generation (RAG)

Last Updated on January 5, 2026 by Editorial Team

Author(s): Michalzarnecki

Originally published on Towards AI.

LLM & AI Agent Applications with LangChain and LangGraph — Part 20: Retrieval-Augmented Generation (RAG)

Hi! Welcome to next part of series related to LLM-based applications developments dedicated to Retrieval-Augmented Generation, or simply RAG.

RAG is a pattern that very quickly became the foundation of many LLM-based applications.
Why? Because it solves one of the biggest weaknesses of language models: limited knowledge.

Imagine a model that can write code, translate text, and answer questions beautifully — but it has no access to your internal company documentation and doesn’t know the latest information from the internet. The model’s knowledge ends at the moment it was trained.

With RAG, we don’t need to train a new model from scratch to “teach it” new things. We simply connect it to an external knowledge source.

How does RAG work?

Very simply — the pipeline consists of two main steps:

Retrieval: we search for fragments of knowledge that match the user’s question.
Generation: the LLM receives the question together with the retrieved context and generates an answer.

This mirrors how humans work: if I don’t know something, I don’t invent it — I look it up in sources, and then write a sensible answer based on what I found.

Where RAG works well

RAG is useful in many scenarios, for example:

a chatbot answering employee questions based on internal documents,
a legal assistant responding to questions about regulations,
a recommendation system for e-commerce,
analysis of financial reports or scientific data.

The key building blocks of a RAG pipeline

A critical aspect of RAG is the data source. We can load PDF documents, text files, web pages, or data from a SQL database. LangChain provides ready-made loaders such as PyPDFLoader, TextLoader, or WebBaseLoader.

1) Load

First, we load the documents.

2) Split

Large documents are too long for the model to handle effectively. That’s why we split them into smaller chunks — for example 500–1000 characters — with a small overlap between chunks.
The overlap helps keep chunks coherent and improves semantic matching.

3) Embed

Next come embeddings. Each chunk is converted into a vector of numbers that represents its meaning.
You can use OpenAIEmbeddings, CohereEmbeddings, or free Hugging Face models.

4) Store

Then we store all embeddings in a vector store — a vector database. This can be:

a local FAISS index,
Qdrant running in Docker,
or managed cloud services like Pinecone or Weaviate.

The vector store lets us find the most similar chunks for a given user question.

5) Retrieve

Next is the retriever. This module takes the user question, turns it into an embedding, and searches the vector store for the nearest chunks.
Those retrieved chunks become context for the model.

6) Generate

Finally we use the LLM. Here we build the prompt: the user’s question, the context returned by the retriever, and additional instructions like:

“Answer only based on the provided context. If the answer isn’t there — say you don’t know.”

Only now does the model generate the final response.

The whole pipeline in six words

You can describe the full RAG pipeline in six words:

load → split → embed → store → retrieve → generate

Alright — let’s move to the notebook.

Install libraries

!pip install langchain_text_splittersPy

Building vector store (FAISS) and retriever

from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from dotenv import load_dotenv

load_dotenv()

docs = [
 "LangChain is a framework for working with LLM.",
 "RAG combines context retrieval with answer generation.",
 "FAISS is a library for storing and searching embeddings.",
 "Retriever is used to find the most similar documents to the user's queries. The retriever can return a variable number of matching documents, specified in the k parameter. The retriever uses various text similarity algorithms, e.g., cosine matching, Euclidean distance, MMR."
]

splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
split = splitter.create_documents(docs)

print(f"Number of chunks: {len(split)}")

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(split, embedding=embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

query = "Why use a retriever?"

context = retriever.invoke(query)
print("Retrieved chunk:")
for i, c in enumerate(context, 1):
 print(f"{i}.", c.page_content)

output:

Number of chunks: 7
Retrieved chunk:
1. Retriever is used to find the most similar documents to the user's queries. The retriever can return
2. The retriever uses various text similarity algorithms, e.g., cosine matching, Euclidean distance,

Simple chain RAG (prompt + context + LLM)

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

rag_prompt = ChatPromptTemplate.from_messages([
 ("system", "Give precise answers based solely on CONTEXT. If there is no data, say you don't know."),
 ("system", "CONTEXT:\\n{context}"),
 ("user", "{question}")
])

rag_chain = (
 {"context": retriever, "question": RunnablePassthrough()}
 | rag_prompt
 | llm
 | StrOutputParser()
)

print(rag_chain.invoke("What is FAISS and what is it for?"))

output:

FAISS is a library for storing and searching embeddings.

Example RAG — full program

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Model
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Source documents
docs = [
"LangChain is a framework for working with LLM.",
"RAG combines context matching with answer generation.",
"FAISS is a library for storing and retrieving embeddings."
]

# Split
splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=10)
splits = splitter.create_documents(docs)

# Embeddings + vector store
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(splits, embedding=embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

# Prompt RAG
prompt = ChatPromptTemplate.from_messages([
 ("system", "Respond only to context:\n{context}"),
 ("user", "{question}")
])

# Pipeline
rag_chain = (
 {
 "context": lambda x: retriever.invoke(x["question"]),
 "question": lambda x: x["question"]
 }
 | prompt
 | llm
 | StrOutputParser()
)


print(rag_chain.invoke({"question": "What is FAISS?"}))

output:

FAISS is a library for storing and retrieving embeddings, which are numerical representations of data, often used in machine learning and information retrieval tasks.

RAG with loop and evaluation

from langchain_core.prompts import ChatPromptTemplate

eval_prompt = ChatPromptTemplate.from_messages([
 ("system", "Evaluate answer."),
 ("user", "Question: {question}\\nAnswer: {answer}\\nIs the answer correct? Respond with only 'yes' or 'no'.")
])

def rag_with_eval(question, max_retries):
 for attempt in range(max_retries):
 context = retriever.invoke(question)
 answer = (prompt | llm | StrOutputParser()).invoke({"context": context, "question": question})
 eval_result = (eval_prompt | llm | StrOutputParser()).invoke({"question": question, "answer": answer})
 print(f"Evaluation result {eval_result}")
 if "yes" in eval_result.lower():
 return f"✅ Answer approved:\\n{answer}"
 print(f"❌ Answer: {answer}\\n rejected, retrying...")
 return "Could not get the correct answer."

print(rag_with_eval("What is RAG?", max_retries=3))

Evaluation result Yes.
✅ Answer approved:\nRAG stands for Retrieval-Augmented Generation. It combines context matching with answer generation, allowing for more accurate and contextually relevant responses by retrieving information from a knowledge base before generating an answer.

That’s all in this part dedicated to Retrieval Augmented Generation RAG. In the next article we will gain intuition on how vector database, embeddings and semantic search works.

see next chapter

see previous chapter

see the full code from this article in the GitHub repository

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.

Frequently Used, Contextual References

Resources

LLM & AI Agent Applications with LangChain and LangGraph — Part 20: Retrieval-Augmented Generation (RAG)

Author(s): Michalzarnecki

How does RAG work?

Where RAG works well

The key building blocks of a RAG pipeline

1) Load

2) Split

3) Embed

4) Store

5) Retrieve

6) Generate

The whole pipeline in six words

Install libraries

Building vector store (FAISS) and retriever

Simple chain RAG (prompt + context + LLM)

Example RAG — full program

RAG with loop and evaluation

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

Full-Stack Data Scientists for the Agentic Coding World

Building Production-Grade AI Skills with Snowflake Cortex AI Function Studio

I Tried 10 AI Agent Frameworks in 2026 — Here’s the Honest Guide I Wish I Had Earlier

How One Spring Boot Optimization Saved Our Startup $30,000 a Year

Inside Palantir AIP: How the World’s Most Controversial AI Platform Actually Works

What Is a Reverse Proxy? (And Why Every Backend Developer Should Care)

What Claude Opus 4.8 Actually Changes If You’re Building Agents

QWEN 3.7 Max Worked For 35 Hrs Straight And The Results Were Mind-blowing

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

LLM & AI Agent Applications with LangChain and LangGraph — Part 20: Retrieval-Augmented Generation (RAG)

Author(s): Michalzarnecki

How does RAG work?

Where RAG works well

The key building blocks of a RAG pipeline

1) Load

2) Split

3) Embed

4) Store

5) Retrieve

6) Generate

The whole pipeline in six words

Install libraries

Building vector store (FAISS) and retriever

Simple chain RAG (prompt + context + LLM)

Example RAG — full program

RAG with loop and evaluation

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement