Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Free: 6-day Agentic AI Engineering Email Guide.
Learnings from Towards AI's hands-on work with real clients.
LLM & AI Agent Applications with LangChain and LangGraph — Part 22: Building a RAG Chatbot in Streamlit
Latest   Machine Learning

LLM & AI Agent Applications with LangChain and LangGraph — Part 22: Building a RAG Chatbot in Streamlit

Last Updated on January 5, 2026 by Editorial Team

Author(s): Michalzarnecki

Originally published on Towards AI.

LLM & AI Agent Applications with LangChain and LangGraph — Part 22: Building a RAG Chatbot in Streamlit

Hi! In this chapter we’ll build a simple, but fully working chatbot application based on RAG. It will load content from a few files containing website text and answer user questions in the context of that data. You can find the fully functional application from this article in GitHub repository.

In previous parts I already explained what RAG is, how semantic search works, and what vector databases are — so now we’ll focus on how to connect these pieces into one coherent system.

💡 Let’s first see how the application works and what we want to achieve.

💡 Then we’ll start with the general idea and the Streamlit skeleton.

💡 And finally, we’ll look at the folder and file structure of the project.

What this app does

The flow is straightforward:

  1. We load the content of a few web pages — previously downloaded and saved as .txt files in the data directory.
  2. We split those texts into logical chunks.
  3. We build a vector store from those chunks.
  4. When the user asks a question, we retrieve the best-matching fragments, feed them to the model, and generate an answer based on that context.

1) Loading documents

The first piece is document loading, handled by a DocumentLoader class.

It scans the data directory, opens all .txt files, and then—using RecursiveCharacterTextSplitter—splits them into chunks of around 1000 characters with a 200-character overlap.

This matters a lot: the overlap prevents the context from being cut mid-sentence and gives the retriever precise “pieces” to match against the question.

Each chunk is stored as a Document object, with the source filename saved in metadata.

from pathlib import Path

from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter

class DocumentLoader:
def __init__(self):
self.documents_path = ''

def load_chunks(self)->list[Document]:
"""Load and chunk text documents from the data directory.

Returns:
list[Document]: A list of Document objects
"""

docs = []
files = []
data_path = Path("data")

for file in data_path.iterdir():
if file.suffix == ".txt":
with open(file) as f:
docs.append(f.read())
files.append(file.name)

# Split documents and store in vector db
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)

splits = []
for i, doc in enumerate(docs):
for chunk in text_splitter.split_text(doc):
splits.append(Document(page_content=chunk, metadata={"source": files[i]}))

return splits

2) Building the vector store

🧠 The second step is creating the vector database.

Here we use a very lightweight and fast option: DocArrayInMemorySearch.

Embeddings are generated locally using BAAI/bge-small-en-v1.5—a small but strong model that performs really well for semantic matching in English.

On top of this, we build a retriever that — on each query — returns the two best chunks (k=2, fetch_k=4).

 # import source documents
chunks = self.document_loader.load_chunks()
vectordb = DocArrayInMemorySearch.from_documents(chunks, self.embedding_model)
self.retriever = vectordb.as_retriever(
search_type='similarity',
search_kwargs={'k': 2, 'fetch_k': 4}
)

3) The RAG chain

The third element is the RAG chain itself.

When the user types a question, we run a pipeline built using LangChain’s declarative approach — LCEL.

The flow looks like this:

  • First, we attach the retriever output to the query (the chunks we found).
  • Then we format those chunks into a single context string and pass it into a prompt (ChatPromptTemplate) that tells the model to answer only using the context.
  • Next, the request goes to GPT-4o with streaming=True, so the answer is generated live, in real time.
  • Finally, we parse the output using a simple StrOutputParser.
 def prepare_rag_chain(self)->RunnableWithMessageHistory:
"""Prepares the RAG chain

Returns:
RunnableWithMessageHistory: The RAG chain
"""

# Retrieval chain - direct retrieval without contextualization
retrieval_chain = RunnablePassthrough.assign(
context=lambda x: self.retriever.invoke(x["input"])
) | RunnablePassthrough.assign(
formatted_context=lambda x: utils.format_docs(x["context"])
)

# Complete RAG chain using LCEL
rag_chain = (
retrieval_chain
| RunnablePassthrough.assign(answer=(
{
"context": itemgetter("formatted_context"),
"input": itemgetter("input"),
"chat_history": itemgetter("chat_history")
}
| self.chat_prompt
| self.llm
| StrOutputParser())
)
)

4) Chat history

The fourth piece is chat history.

We use RunnableWithMessageHistory, which under the hood relies on ChatMessageHistory. That means the model has access to previous questions and answers—so we can run a multi-turn conversation, not just isolated single prompts.

The full history is also synchronized with Streamlit state, so the user sees the entire conversation on screen.

 # Wrap with message history
conversational_rag_chain = RunnableWithMessageHistory(
rag_chain,
self.get_session_history,
input_messages_key="input",
history_messages_key="chat_history",
output_messages_key="answer",
)
def get_session_history(self, session_id: str) -> BaseChatMessageHistory:
"""Retrieves the chat message history for a specific session
Args:
session_id (str): The session ID
Returns:
BaseChatMessageHistory: The chat message history for the session
"
""
if session_id not in self.store:
self.store[session_id] = ChatMessageHistory()
# Initialize with existing messages from streamlit session
if "messages" in st.session_state:
for msg in st.session_state["messages"]:
if msg["role"] == "user":
self.store[session_id].add_message(HumanMessage(content=msg["content"]))
elif msg["role"] == "assistant":
self.store[session_id].add_message(AIMessage(content=msg["content"]))
return self.store[session_id]

5) The Streamlit UI

And finally — the Streamlit interface.

We use:

  • st.chat_input for the user prompt,
  • st.chat_message for rendering messages.

Because the model streams tokens, the response appears gradually — exactly like in ChatGPT.

After generation finishes, the app also shows sources — the text fragments that the answer was based on. For each document we display the filename and a preview of content inside an expandable popover.

Here is a very simple chatbot application template in Streamlit:

import streamlit as st

st.set_page_config(page_title="Simple Chat", page_icon="💬")
st.header('Simple Streamlit Chat')
st.write('A basic chat interface demonstrating Streamlit chat components.')


def initialize_chat_history():
"""Initialize chat history in session state"""
if "messages" not in st.session_state:
st.session_state.messages = [
{"role": "assistant", "content": "Hello! How can I help you today?"}
]

def display_chat_history():
"""Display all messages from chat history"""
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.write(message["content"])


def handle_user_input(user_query: str):
"""Handle user input and generate response"""
# Display user message
with st.chat_message("user"):
st.write(user_query)

# Add user message to history
st.session_state.messages.append({"role": "user", "content": user_query})

# Simulate a simple response (replace this with your own logic)
response = f"You said: '{user_query}'. This is a simple echo response!"

# Display assistant response
with st.chat_message("assistant"):
st.write(response)

# Add assistant response to history
st.session_state.messages.append({"role": "assistant", "content": response})


def main():
# Initialize chat history
initialize_chat_history()

# Display existing chat history
display_chat_history()

# Chat input at the bottom
user_query = st.chat_input(placeholder="Type your message here...")

if user_query:
handle_user_input(user_query)


if __name__ == "__main__":
main()

Application

Below present application code that wraps everything together and uses previously introduced code parts.

 @utils.enable_chat_history
def main(self):
user_query = st.chat_input(placeholder="Ask for information from documents")

if user_query:
# Get or initialize the session history before building the chain
session_history = self.get_session_history("default_session")

rag_chain = self.prepare_rag_chain()

utils.display_msg(user_query, 'user')

# Add user message to LangChain history
session_history.add_message(HumanMessage(content=user_query))

with st.spinner('Preparing response...'):
with st.chat_message("assistant"):
response_placeholder = st.empty()
response_text = ""
context_docs = None

# Stream the response chunks
for chunk in rag_chain.stream(
{"input": user_query},
config={"configurable": {"session_id": "default_session"}}
):
# Capture the answer chunks for streaming
if "answer" in chunk:
# For streaming, answer will come in parts
if isinstance(chunk["answer"], str):
response_text += chunk["answer"]
response_placeholder.markdown(response_text)
# Capture context documents for references
if "context" in chunk:
context_docs = chunk["context"]

# Store the complete response in both places
st.session_state.messages.append({"role": "assistant", "content": response_text})
# Add assistant message to LangChain history
session_history.add_message(AIMessage(content=response_text))

utils.print_qa(CustomDocChatbot, user_query, response_text)

# Show references if available
if context_docs:
for doc in context_docs:
filename = os.path.basename(doc.metadata['source'])
ref_title = f":blue[Source document: {filename}]"
with st.popover(ref_title):
st.caption(doc.page_content)


if __name__ == "__main__":
obj = CustomDocChatbot()
obj.main()

Improvements

If you want to take this mini-app one step further, there are a few very practical improvements worth doing. First, instead of an in-memory index, move to an external vector database (for example Qdrant, Weaviate, or Pinecone) so the knowledge base persists across restarts and scales beyond a few files. Second, separate document loading and synchronization into a dedicated script: one job that crawls/updates sources, splits content, builds embeddings, and refreshes the vector store — so the chatbot stays lightweight and only focuses on retrieval + answering. And third, expand the prompting layer: add multiple prompts and conditional logic (e.g., different instructions for factual Q&A vs. summarization, stricter “answer only from context” mode when confidence is low, fallback prompts when retrieval returns weak matches, and guardrails around output format).

Summary

Everything runs inside a single class (CustomDocChatbot) and just a few helper files.

There’s no external backend, no database, no React frontend — just Python, LangChain, and Streamlit.

And that’s the point: with a surprisingly small amount of code, you can build your own fully working domain-aware chatbot, grounded in your own sources.

That’s all in this part dedicated to building RAG chatbot. In the next article we will finally start building graphs with LangGraph framework and later combine them with LLM API in order to build agentic workflows.

see next chapter

see previous chapter

see the full code from this article in the GitHub repository

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

6-Day Agentic AI Engineering Email Guide — one practical lesson per day

Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.