LLM & AI Agent Applications with LangChain and LangGraph — Part 9 — Conversation Memory
Last Updated on January 2, 2026 by Editorial Team
Author(s): Michalzarnecki
Originally published on Towards AI.

Welcome back to another article focused on the LLM-driven applications development.
In this part of the course we’ll look at memory in LangChain — in other words, how to make sure that the assistant you’re building doesn’t treat every message as a brand-new conversation.
Memory in LangChain is not a single function or switch you flip. It’s more like a family of patterns that all aim to do the same thing: inject context from earlier turns into the prompt for the next model call. Thanks to that, you can build dialogues that feel natural instead of robotic.
A simple example:
- In one message the user says: “My name is Michał.”
- A few turns later ask: “What’s my name?”
With memory in place, the assistant can answer “Your name is Michał,” because that information was stored and passed along, not forgotten.
Of course, as soon as we start carrying previous messages forward, we run into two practical constraints: token cost and context window limits. The more history you include, the larger each request becomes. That means it costs more and you’re more likely to hit the model’s context limit.
In the notebook that follows this article we’ll walk through three key approaches to memory:
1. Full history: the simplest form of memory
The most direct approach is what I’ll call History.
You simply keep the full list of messages in the conversation and attach them all to the prompt on each call. This gives you the highest fidelity: the model sees every detail you’ve shared so far. For short conversations this works extremely well, and it’s a good way to get started.
The downside is obvious: as the dialogue grows, so does the prompt. Cost increases, and at some point you’ll hit the context window limit. For longer sessions that can become a real problem.
So: full history is great for short, detailed conversations; less ideal for long-running chats or heavy-traffic systems.
2. Memory components: RunnableWithMessageHistory
The second approach is what I would call the practical LangChain version of memory.
Instead of manually concatenating strings and pushing them into prompts, you use dedicated components such as RunnableWithMessageHistory and InMemoryChatMessageHistory.
The pattern looks like this:
- In your prompt, you leave a placeholder like
MessagesPlaceholder("history"). - You wrap your chain or runnable with
RunnableWithMessageHistory. - You identify each session with an ID (for example, a user ID or session token).
LangChain then handles the rest. For each session ID it stores the incoming and outgoing messages, and whenever you call the runnable again, it automatically injects the right history into the history placeholder.
The nice thing about this pattern is that it’s easy to extend. InMemoryChatMessageHistory is a good default for local tests and simple applications. Later, you can swap it for a persistent storage backend – a database, Redis, or any custom store – without rewriting the rest of your logic.
You get a convenient session mechanism that can evolve from “in-process toy” to “production-grade memory” when you’re ready.
3. Summary-based memory: compressing long conversations
The third approach is Summary.
At first glance it looks similar to the previous pattern: you still maintain some representation of the previous dialogue and inject it into the system context. The difference is that you don’t keep the full conversation. Instead, you store and update a compressed summary.
The idea is simple:
- after a few exchanges, you generate a concise summary of what has happened so far
- you store this summary instead of the raw messages
- at the next step, you give the model the latest messages plus this summary
This trades detail for scalability. You dramatically reduce the number of tokens used for history, which makes it easier to support long-running conversations without blowing up cost or hitting context limits.
In the notebook we’ll build a simple chain that, every few turns, updates a textual summary of the dialogue and places it into the system message. The assistant then uses that summary as its “memory” of the session.
What to pay attention to in practice
When you start using memory in real applications, a few things are worth keeping in mind.
Cost and context window.
Full history gives you the best fidelity but also the highest cost. For anything non-trivial, summary-based approaches or hybrid strategies become necessary. Always keep an eye on how many tokens you send per request.
Quality trade-offs.
Summarisation is not free. If you compress too aggressively or update the summary too rarely, you will lose important details. It’s worth experimenting with how often you recompute the summary and what level of detail you keep.
Security and privacy.
Everything you put into the prompt is sent to the model. If your application deals with personal data or sensitive content, you should filter and sanitise the history before including it. In some cases you might decide not to store certain fields at all.
Persistence.
In-memory session history disappears when the process restarts. For production systems, you usually want memory that survives crashes, deployments and auto-scaling. That means moving from simple in-process stores to an external database or cache that keeps chat history reliably.
From this point on we’ll switch over to the notebook and implement each of these approaches:
- full history,
RunnableWithMessageHistorywith in-memory storage,- and a simple summary-based memory pattern.
You’ll be able to run the examples yourself, tweak them and see how different memory strategies change the behaviour, cost and robustness of your assistant.
Install libraries and load environment variables
!pip install -q langchain-openai python-dotenv
from dotenv import load_dotenv
load_dotenv()
History
History — stores and injects the entire conversation history message by message into the prompt.
from langchain_core.chat_history import InMemoryChatMessageHistory
# message history
history = InMemoryChatMessageHistory()
history.add_user_message("Buenos dias!")
history.add_ai_message("hello!")
history.add_user_message("Whats your name?")
history.add_ai_message("My name is GIGACHAT")
history.messages
output:
[HumanMessage(content='Buenos dias!', additional_kwargs={}, response_metadata={}),
AIMessage(content='hello!', additional_kwargs={}, response_metadata={}),
HumanMessage(content='Whats your name?', additional_kwargs={}, response_metadata={}),
AIMessage(content='My name is GIGACHAT', additional_kwargs={}, response_metadata={})]
Memory
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# message types: user, assistant, system, function, tool
prompt = ChatPromptTemplate.from_messages([
("system", "You have a friendly conversation and remember the context."
"Respond in english."),
MessagesPlaceholder("history"),
("user", "{input}"),
])
chain = prompt | llm
store = {}
def get_history(session_id: str):
if session_id not in store:
store[session_id] = InMemoryChatMessageHistory()
return store[session_id]
chain_with_memory = RunnableWithMessageHistory(
chain,
get_session_history=get_history,
input_messages_key="input",
history_messages_key="history",
)
sid = "demo-session-123"
resp1 = chain_with_memory.invoke(
{"input": "Hello. My name is Walter White but everyone call me Heisenberg."},
config={"configurable": {"session_id": sid}}
)
print(resp1.content)
resp2 = chain_with_memory.invoke(
{"input": "Say my name."},
config={"configurable": {"session_id": sid}}
)
print(resp2.content)
output:
Hello, Heisenberg! That's quite a memorable name. Are you a fan of the show, or is there another reason you go by that name?
Heisenberg! You’ve definitely made a name for yourself. What’s on your mind today?
Summary
Summary memory — instead of the full history, passes a condensed summary of previous conversations to the model, which saves tokens and makes it easier to scale long dialogues.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
summarizer_prompt = ChatPromptTemplate.from_messages([
("system", "Summarize the following conversation briefly:"),
("human", "{conversation}")
])
summarizer = summarizer_prompt | llm | StrOutputParser()
conversation_prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant. Here is the summary of prior conversation:\n{summary}"),
MessagesPlaceholder("history"),
("human", "{input}")
])
conversation_chain = conversation_prompt | llm | StrOutputParser()
store = {}
summaries = {}
def get_history(session_id: str) -> InMemoryChatMessageHistory:
if session_id not in store:
store[session_id] = InMemoryChatMessageHistory()
return store[session_id]
def get_summary(session_id: str) -> str:
if session_id not in summaries:
summaries[session_id] = "No previous conversation."
return summaries[session_id]
def update_summary(session_id: str, threshold: int = 4):
"""Update summary and clear old messages when threshold is reached"""
history = get_history(session_id)
if len(history.messages) >= threshold:
current_summary = summaries.get(session_id, "")
if current_summary and current_summary != "No previous conversation.":
conversation_text = f"Previous summary: {current_summary}\n\nRecent messages: {history.messages}"
else:
conversation_text = str(history.messages)
new_summary = summarizer.invoke({"conversation": conversation_text})
summaries[session_id] = new_summary
history.clear()
chain_with_memory = RunnableWithMessageHistory(
conversation_chain,
get_session_history=get_history,
input_messages_key="input",
history_messages_key="history"
)
sid = "demo-summary"
cfg = {"configurable": {"session_id": sid}}
summary = get_summary(sid)
response1 = chain_with_memory.invoke({"input": "Hello! My name is Michael.", "summary": summary}, cfg)
print(response1)
# Update summary periodically (e.g., after every 2-3 exchanges)
update_summary(sid, threshold=4)
summary = get_summary(sid)
response2 = chain_with_memory.invoke({"input": "What is my name?", "summary": summary}, cfg)
print(response2)
update_summary(sid, threshold=4)
output:
Hello, Michał! How can I assist you today?
Your name is Michał. How can I help you today?
That all for this chapter. In the next part I will cover the topic of “chains” in LangChain that gave the library name.
see next chapter
see previous chapter
see the full code from this article in the GitHub repository
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Towards AI Academy
We Build Enterprise-Grade AI. We'll Teach You to Master It Too.
15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.
Start free — no commitment:
→ 6-Day Agentic AI Engineering Email Guide — one practical lesson per day
→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages
Our courses:
→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.
→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.
→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.
Note: Article content contains the views of the contributing authors and not Towards AI.