LLM & AI Agent Applications with LangChain and LangGraph — Part 22: Building a RAG Chatbot in Streamlit

Last Updated on January 5, 2026 by Editorial Team

Author(s): Michalzarnecki

Originally published on Towards AI.

LLM & AI Agent Applications with LangChain and LangGraph — Part 22: Building a RAG Chatbot in Streamlit

Hi! In this chapter we’ll build a simple, but fully working chatbot application based on RAG. It will load content from a few files containing website text and answer user questions in the context of that data. You can find the fully functional application from this article in GitHub repository.

In previous parts I already explained what RAG is, how semantic search works, and what vector databases are — so now we’ll focus on how to connect these pieces into one coherent system.

💡 Let’s first see how the application works and what we want to achieve.

💡 Then we’ll start with the general idea and the Streamlit skeleton.

💡 And finally, we’ll look at the folder and file structure of the project.

What this app does

The flow is straightforward:

We load the content of a few web pages — previously downloaded and saved as .txt files in the data directory.
We split those texts into logical chunks.
We build a vector store from those chunks.
When the user asks a question, we retrieve the best-matching fragments, feed them to the model, and generate an answer based on that context.

1) Loading documents

The first piece is document loading, handled by a DocumentLoader class.

It scans the data directory, opens all .txt files, and then—using RecursiveCharacterTextSplitter—splits them into chunks of around 1000 characters with a 200-character overlap.

This matters a lot: the overlap prevents the context from being cut mid-sentence and gives the retriever precise “pieces” to match against the question.

Each chunk is stored as a Document object, with the source filename saved in metadata.

from pathlib import Path

from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter

class DocumentLoader:
 def __init__(self):
 self.documents_path = ''

 def load_chunks(self)->list[Document]:
 """Load and chunk text documents from the data directory.

 Returns:
 list[Document]: A list of Document objects
 """
 docs = []
 files = []
 data_path = Path("data")

 for file in data_path.iterdir():
 if file.suffix == ".txt":
 with open(file) as f:
 docs.append(f.read())
 files.append(file.name)

 # Split documents and store in vector db
 text_splitter = RecursiveCharacterTextSplitter(
 chunk_size=1000,
 chunk_overlap=200
 )

 splits = []
 for i, doc in enumerate(docs):
 for chunk in text_splitter.split_text(doc):
 splits.append(Document(page_content=chunk, metadata={"source": files[i]}))

 return splits

2) Building the vector store

🧠 The second step is creating the vector database.

Here we use a very lightweight and fast option: DocArrayInMemorySearch.

Embeddings are generated locally using BAAI/bge-small-en-v1.5—a small but strong model that performs really well for semantic matching in English.

On top of this, we build a retriever that — on each query — returns the two best chunks (k=2, fetch_k=4).

 # import source documents
 chunks = self.document_loader.load_chunks()
 vectordb = DocArrayInMemorySearch.from_documents(chunks, self.embedding_model)
 self.retriever = vectordb.as_retriever(
 search_type='similarity',
 search_kwargs={'k': 2, 'fetch_k': 4}
 )

3) The RAG chain

The third element is the RAG chain itself.

When the user types a question, we run a pipeline built using LangChain’s declarative approach — LCEL.

The flow looks like this:

First, we attach the retriever output to the query (the chunks we found).
Then we format those chunks into a single context string and pass it into a prompt (ChatPromptTemplate) that tells the model to answer only using the context.
Next, the request goes to GPT-4o with streaming=True, so the answer is generated live, in real time.
Finally, we parse the output using a simple StrOutputParser.

 def prepare_rag_chain(self)->RunnableWithMessageHistory:
 """Prepares the RAG chain

 Returns:
 RunnableWithMessageHistory: The RAG chain
 """
 # Retrieval chain - direct retrieval without contextualization
 retrieval_chain = RunnablePassthrough.assign(
 context=lambda x: self.retriever.invoke(x["input"])
 ) | RunnablePassthrough.assign(
 formatted_context=lambda x: utils.format_docs(x["context"])
 )

 # Complete RAG chain using LCEL
 rag_chain = (
 retrieval_chain
 | RunnablePassthrough.assign(answer=(
 {
 "context": itemgetter("formatted_context"),
 "input": itemgetter("input"),
 "chat_history": itemgetter("chat_history")
 }
 | self.chat_prompt
 | self.llm
 | StrOutputParser())
 )
 )

4) Chat history

The fourth piece is chat history.

We use RunnableWithMessageHistory, which under the hood relies on ChatMessageHistory. That means the model has access to previous questions and answers—so we can run a multi-turn conversation, not just isolated single prompts.

The full history is also synchronized with Streamlit state, so the user sees the entire conversation on screen.

 # Wrap with message history
 conversational_rag_chain = RunnableWithMessageHistory(
 rag_chain,
 self.get_session_history,
 input_messages_key="input",
 history_messages_key="chat_history",
 output_messages_key="answer",
 )

def get_session_history(self, session_id: str) -> BaseChatMessageHistory:
 """Retrieves the chat message history for a specific session
 Args:
 session_id (str): The session ID
 Returns:
 BaseChatMessageHistory: The chat message history for the session
 """
 if session_id not in self.store:
 self.store[session_id] = ChatMessageHistory()
 # Initialize with existing messages from streamlit session
 if "messages" in st.session_state:
 for msg in st.session_state["messages"]:
 if msg["role"] == "user":
 self.store[session_id].add_message(HumanMessage(content=msg["content"]))
 elif msg["role"] == "assistant":
 self.store[session_id].add_message(AIMessage(content=msg["content"]))
 return self.store[session_id]

5) The Streamlit UI

And finally — the Streamlit interface.

We use:

st.chat_input for the user prompt,
st.chat_message for rendering messages.

Because the model streams tokens, the response appears gradually — exactly like in ChatGPT.

After generation finishes, the app also shows sources — the text fragments that the answer was based on. For each document we display the filename and a preview of content inside an expandable popover.

Here is a very simple chatbot application template in Streamlit:

import streamlit as st

st.set_page_config(page_title="Simple Chat", page_icon="💬")
st.header('Simple Streamlit Chat')
st.write('A basic chat interface demonstrating Streamlit chat components.')


def initialize_chat_history():
 """Initialize chat history in session state"""
 if "messages" not in st.session_state:
 st.session_state.messages = [
 {"role": "assistant", "content": "Hello! How can I help you today?"}
 ]

def display_chat_history():
 """Display all messages from chat history"""
 for message in st.session_state.messages:
 with st.chat_message(message["role"]):
 st.write(message["content"])


def handle_user_input(user_query: str):
 """Handle user input and generate response"""
 # Display user message
 with st.chat_message("user"):
 st.write(user_query)

 # Add user message to history
 st.session_state.messages.append({"role": "user", "content": user_query})

 # Simulate a simple response (replace this with your own logic)
 response = f"You said: '{user_query}'. This is a simple echo response!"

 # Display assistant response
 with st.chat_message("assistant"):
 st.write(response)

 # Add assistant response to history
 st.session_state.messages.append({"role": "assistant", "content": response})


def main():
 # Initialize chat history
 initialize_chat_history()

 # Display existing chat history
 display_chat_history()

 # Chat input at the bottom
 user_query = st.chat_input(placeholder="Type your message here...")

 if user_query:
 handle_user_input(user_query)


if __name__ == "__main__":
 main()

Application

Below present application code that wraps everything together and uses previously introduced code parts.

 @utils.enable_chat_history
 def main(self):
 user_query = st.chat_input(placeholder="Ask for information from documents")

 if user_query:
 # Get or initialize the session history before building the chain
 session_history = self.get_session_history("default_session")

 rag_chain = self.prepare_rag_chain()

 utils.display_msg(user_query, 'user')

 # Add user message to LangChain history
 session_history.add_message(HumanMessage(content=user_query))

 with st.spinner('Preparing response...'):
 with st.chat_message("assistant"):
 response_placeholder = st.empty()
 response_text = ""
 context_docs = None

 # Stream the response chunks
 for chunk in rag_chain.stream(
 {"input": user_query},
 config={"configurable": {"session_id": "default_session"}}
 ):
 # Capture the answer chunks for streaming
 if "answer" in chunk:
 # For streaming, answer will come in parts
 if isinstance(chunk["answer"], str):
 response_text += chunk["answer"]
 response_placeholder.markdown(response_text)
 # Capture context documents for references
 if "context" in chunk:
 context_docs = chunk["context"]

 # Store the complete response in both places
 st.session_state.messages.append({"role": "assistant", "content": response_text})
 # Add assistant message to LangChain history
 session_history.add_message(AIMessage(content=response_text))

 utils.print_qa(CustomDocChatbot, user_query, response_text)

 # Show references if available
 if context_docs:
 for doc in context_docs:
 filename = os.path.basename(doc.metadata['source'])
 ref_title = f":blue[Source document: {filename}]"
 with st.popover(ref_title):
 st.caption(doc.page_content)


if __name__ == "__main__":
 obj = CustomDocChatbot()
 obj.main()

Improvements

If you want to take this mini-app one step further, there are a few very practical improvements worth doing. First, instead of an in-memory index, move to an external vector database (for example Qdrant, Weaviate, or Pinecone) so the knowledge base persists across restarts and scales beyond a few files. Second, separate document loading and synchronization into a dedicated script: one job that crawls/updates sources, splits content, builds embeddings, and refreshes the vector store — so the chatbot stays lightweight and only focuses on retrieval + answering. And third, expand the prompting layer: add multiple prompts and conditional logic (e.g., different instructions for factual Q&A vs. summarization, stricter “answer only from context” mode when confidence is low, fallback prompts when retrieval returns weak matches, and guardrails around output format).

Summary

Everything runs inside a single class (CustomDocChatbot) and just a few helper files.

There’s no external backend, no database, no React frontend — just Python, LangChain, and Streamlit.

And that’s the point: with a surprisingly small amount of code, you can build your own fully working domain-aware chatbot, grounded in your own sources.

That’s all in this part dedicated to building RAG chatbot. In the next article we will finally start building graphs with LangGraph framework and later combine them with LLM API in order to build agentic workflows.

see next chapter

see previous chapter

see the full code from this article in the GitHub repository

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

LLM & AI Agent Applications with LangChain and LangGraph — Part 22: Building a RAG Chatbot in Streamlit

Author(s): Michalzarnecki

What this app does

1) Loading documents

2) Building the vector store

3) The RAG chain

4) Chat history

5) The Streamlit UI

Application

Improvements

Summary

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

Crack ML Interviews with Confidence: K-Nearest Neighbors (KNN 20 Q&A)

The Event-Driven Blueprint: How I Scaled a Spring Boot System to 10 Million Kafka Messages/Day

Building Vector Search? Why FAISS Alone Isn’t Enough

TAI #202: GPT-5.5 Moves Codex Into Real Work

Machine Learning System Design -The Model Serving Triangle, With One Forward Pass Flowing Through Every Trade-off (Part3)

AI Orchestration in Action: How MuleSoft and LLMs Fuel the Future of Enterprise AI

GPT-4 Has 1.8 Trillion Parameters. It Uses 2% of Them Per Token.

Part 20: Data Manipulation in Multi-Dimensional Aggregation

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

LLM & AI Agent Applications with LangChain and LangGraph — Part 22: Building a RAG Chatbot in Streamlit

Author(s): Michalzarnecki

What this app does

1) Loading documents

2) Building the vector store

3) The RAG chain

4) Chat history

5) The Streamlit UI

Application

Improvements

Summary

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement