Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Free: 6-day Agentic AI Engineering Email Guide.
Learnings from Towards AI's hands-on work with real clients.
LLM & AI Agent Applications with LangChain and LangGraph — Part 20: Retrieval-Augmented Generation (RAG)
Latest   Machine Learning

LLM & AI Agent Applications with LangChain and LangGraph — Part 20: Retrieval-Augmented Generation (RAG)

Last Updated on January 5, 2026 by Editorial Team

Author(s): Michalzarnecki

Originally published on Towards AI.

LLM & AI Agent Applications with LangChain and LangGraph — Part 20: Retrieval-Augmented Generation (RAG)

Hi! Welcome to next part of series related to LLM-based applications developments dedicated to Retrieval-Augmented Generation, or simply RAG.

RAG is a pattern that very quickly became the foundation of many LLM-based applications.
Why? Because it solves one of the biggest weaknesses of language models: limited knowledge.

Imagine a model that can write code, translate text, and answer questions beautifully — but it has no access to your internal company documentation and doesn’t know the latest information from the internet. The model’s knowledge ends at the moment it was trained.

With RAG, we don’t need to train a new model from scratch to “teach it” new things. We simply connect it to an external knowledge source.

How does RAG work?

Very simply — the pipeline consists of two main steps:

  1. Retrieval: we search for fragments of knowledge that match the user’s question.
  2. Generation: the LLM receives the question together with the retrieved context and generates an answer.

This mirrors how humans work: if I don’t know something, I don’t invent it — I look it up in sources, and then write a sensible answer based on what I found.

Where RAG works well

RAG is useful in many scenarios, for example:

  • a chatbot answering employee questions based on internal documents,
  • a legal assistant responding to questions about regulations,
  • a recommendation system for e-commerce,
  • analysis of financial reports or scientific data.

The key building blocks of a RAG pipeline

A critical aspect of RAG is the data source. We can load PDF documents, text files, web pages, or data from a SQL database. LangChain provides ready-made loaders such as PyPDFLoader, TextLoader, or WebBaseLoader.

1) Load

First, we load the documents.

2) Split

Large documents are too long for the model to handle effectively. That’s why we split them into smaller chunks — for example 500–1000 characters — with a small overlap between chunks.
The overlap helps keep chunks coherent and improves semantic matching.

3) Embed

Next come embeddings. Each chunk is converted into a vector of numbers that represents its meaning.
You can use OpenAIEmbeddings, CohereEmbeddings, or free Hugging Face models.

4) Store

Then we store all embeddings in a vector store — a vector database. This can be:

  • a local FAISS index,
  • Qdrant running in Docker,
  • or managed cloud services like Pinecone or Weaviate.

The vector store lets us find the most similar chunks for a given user question.

5) Retrieve

Next is the retriever. This module takes the user question, turns it into an embedding, and searches the vector store for the nearest chunks.
Those retrieved chunks become context for the model.

6) Generate

Finally we use the LLM. Here we build the prompt: the user’s question, the context returned by the retriever, and additional instructions like:

“Answer only based on the provided context. If the answer isn’t there — say you don’t know.”

Only now does the model generate the final response.

The whole pipeline in six words

You can describe the full RAG pipeline in six words:

load → split → embed → store → retrieve → generate

Alright — let’s move to the notebook.

Install libraries

!pip install langchain_text_splittersPy

Building vector store (FAISS) and retriever

from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from dotenv import load_dotenv

load_dotenv()

docs = [
"LangChain is a framework for working with LLM.",
"RAG combines context retrieval with answer generation.",
"FAISS is a library for storing and searching embeddings.",
"Retriever is used to find the most similar documents to the user's queries. The retriever can return a variable number of matching documents, specified in the k parameter. The retriever uses various text similarity algorithms, e.g., cosine matching, Euclidean distance, MMR."
]

splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
split = splitter.create_documents(docs)

print(f"Number of chunks: {len(split)}")

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(split, embedding=embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

query = "Why use a retriever?"

context = retriever.invoke(query)
print("Retrieved chunk:")
for i, c in enumerate(context, 1):
print(f"{i}.", c.page_content)

output:

Number of chunks: 7
Retrieved chunk:
1. Retriever is used to find the most similar documents to the user's queries. The retriever can return
2. The retriever uses various text similarity algorithms, e.g., cosine matching, Euclidean distance,

Simple chain RAG (prompt + context + LLM)

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

rag_prompt = ChatPromptTemplate.from_messages([
("system", "Give precise answers based solely on CONTEXT. If there is no data, say you don't know."),
("system", "CONTEXT:\\n{context}"),
("user", "{question}")
])

rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| rag_prompt
| llm
| StrOutputParser()
)

print(rag_chain.invoke("What is FAISS and what is it for?"))

output:

FAISS is a library for storing and searching embeddings.

Example RAG — full program

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Model
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Source documents
docs = [
"LangChain is a framework for working with LLM.",
"RAG combines context matching with answer generation.",
"FAISS is a library for storing and retrieving embeddings."
]

# Split
splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=10)
splits = splitter.create_documents(docs)

# Embeddings + vector store
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(splits, embedding=embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

# Prompt RAG
prompt = ChatPromptTemplate.from_messages([
("system", "Respond only to context:\n{context}"),
("user", "{question}")
])

# Pipeline
rag_chain = (
{
"context": lambda x: retriever.invoke(x["question"]),
"question": lambda x: x["question"]
}
| prompt
| llm
| StrOutputParser()
)


print(rag_chain.invoke({"question": "What is FAISS?"}))

output:

FAISS is a library for storing and retrieving embeddings, which are numerical representations of data, often used in machine learning and information retrieval tasks.

RAG with loop and evaluation

from langchain_core.prompts import ChatPromptTemplate

eval_prompt = ChatPromptTemplate.from_messages([
("system", "Evaluate answer."),
("user", "Question: {question}\\nAnswer: {answer}\\nIs the answer correct? Respond with only 'yes' or 'no'.")
])

def rag_with_eval(question, max_retries):
for attempt in range(max_retries):
context = retriever.invoke(question)
answer = (prompt | llm | StrOutputParser()).invoke({"context": context, "question": question})
eval_result = (eval_prompt | llm | StrOutputParser()).invoke({"question": question, "answer": answer})
print(f"Evaluation result {eval_result}")
if "yes" in eval_result.lower():
return f"✅ Answer approved:\\n{answer}"
print(f"❌ Answer: {answer}\\n rejected, retrying...")
return "Could not get the correct answer."

print(rag_with_eval("What is RAG?", max_retries=3))
Evaluation result Yes.
✅ Answer approved:\nRAG stands for Retrieval-Augmented Generation. It combines context matching with answer generation, allowing for more accurate and contextually relevant responses by retrieving information from a knowledge base before generating an answer.

That’s all in this part dedicated to Retrieval Augmented Generation RAG. In the next article we will gain intuition on how vector database, embeddings and semantic search works.

see next chapter

see previous chapter

see the full code from this article in the GitHub repository

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

6-Day Agentic AI Engineering Email Guide — one practical lesson per day

Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.