LLM & AI Agent Applications with LangChain and LangGraph — Part 9 — Conversation Memory

Last Updated on January 2, 2026 by Editorial Team

Author(s): Michalzarnecki

Originally published on Towards AI.

LLM & AI Agent Applications with LangChain and LangGraph — Part 9 — Conversation Memory

Welcome back to another article focused on the LLM-driven applications development.

In this part of the course we’ll look at memory in LangChain — in other words, how to make sure that the assistant you’re building doesn’t treat every message as a brand-new conversation.

Memory in LangChain is not a single function or switch you flip. It’s more like a family of patterns that all aim to do the same thing: inject context from earlier turns into the prompt for the next model call. Thanks to that, you can build dialogues that feel natural instead of robotic.

A simple example:

In one message the user says: “My name is Michał.”
A few turns later ask: “What’s my name?”

With memory in place, the assistant can answer “Your name is Michał,” because that information was stored and passed along, not forgotten.

Of course, as soon as we start carrying previous messages forward, we run into two practical constraints: token cost and context window limits. The more history you include, the larger each request becomes. That means it costs more and you’re more likely to hit the model’s context limit.

In the notebook that follows this article we’ll walk through three key approaches to memory:

1. Full history: the simplest form of memory

The most direct approach is what I’ll call History.

You simply keep the full list of messages in the conversation and attach them all to the prompt on each call. This gives you the highest fidelity: the model sees every detail you’ve shared so far. For short conversations this works extremely well, and it’s a good way to get started.

The downside is obvious: as the dialogue grows, so does the prompt. Cost increases, and at some point you’ll hit the context window limit. For longer sessions that can become a real problem.

So: full history is great for short, detailed conversations; less ideal for long-running chats or heavy-traffic systems.

2. Memory components: `RunnableWithMessageHistory`

The second approach is what I would call the practical LangChain version of memory.

Instead of manually concatenating strings and pushing them into prompts, you use dedicated components such as RunnableWithMessageHistory and InMemoryChatMessageHistory.

The pattern looks like this:

In your prompt, you leave a placeholder like MessagesPlaceholder("history").
You wrap your chain or runnable with RunnableWithMessageHistory.
You identify each session with an ID (for example, a user ID or session token).

LangChain then handles the rest. For each session ID it stores the incoming and outgoing messages, and whenever you call the runnable again, it automatically injects the right history into the history placeholder.

The nice thing about this pattern is that it’s easy to extend. InMemoryChatMessageHistory is a good default for local tests and simple applications. Later, you can swap it for a persistent storage backend – a database, Redis, or any custom store – without rewriting the rest of your logic.

You get a convenient session mechanism that can evolve from “in-process toy” to “production-grade memory” when you’re ready.

3. Summary-based memory: compressing long conversations

The third approach is Summary.

At first glance it looks similar to the previous pattern: you still maintain some representation of the previous dialogue and inject it into the system context. The difference is that you don’t keep the full conversation. Instead, you store and update a compressed summary.

The idea is simple:

after a few exchanges, you generate a concise summary of what has happened so far
you store this summary instead of the raw messages
at the next step, you give the model the latest messages plus this summary

This trades detail for scalability. You dramatically reduce the number of tokens used for history, which makes it easier to support long-running conversations without blowing up cost or hitting context limits.

In the notebook we’ll build a simple chain that, every few turns, updates a textual summary of the dialogue and places it into the system message. The assistant then uses that summary as its “memory” of the session.

What to pay attention to in practice

When you start using memory in real applications, a few things are worth keeping in mind.

Cost and context window.
Full history gives you the best fidelity but also the highest cost. For anything non-trivial, summary-based approaches or hybrid strategies become necessary. Always keep an eye on how many tokens you send per request.

Quality trade-offs.
Summarisation is not free. If you compress too aggressively or update the summary too rarely, you will lose important details. It’s worth experimenting with how often you recompute the summary and what level of detail you keep.

Security and privacy.
Everything you put into the prompt is sent to the model. If your application deals with personal data or sensitive content, you should filter and sanitise the history before including it. In some cases you might decide not to store certain fields at all.

Persistence.
In-memory session history disappears when the process restarts. For production systems, you usually want memory that survives crashes, deployments and auto-scaling. That means moving from simple in-process stores to an external database or cache that keeps chat history reliably.

From this point on we’ll switch over to the notebook and implement each of these approaches:

full history,
RunnableWithMessageHistory with in-memory storage,
and a simple summary-based memory pattern.

You’ll be able to run the examples yourself, tweak them and see how different memory strategies change the behaviour, cost and robustness of your assistant.

Install libraries and load environment variables

!pip install -q langchain-openai python-dotenv

from dotenv import load_dotenv
load_dotenv()

History

History — stores and injects the entire conversation history message by message into the prompt.

from langchain_core.chat_history import InMemoryChatMessageHistory

# message history
history = InMemoryChatMessageHistory()

history.add_user_message("Buenos dias!")
history.add_ai_message("hello!")
history.add_user_message("Whats your name?")
history.add_ai_message("My name is GIGACHAT")

history.messages

output:

[HumanMessage(content='Buenos dias!', additional_kwargs={}, response_metadata={}),
 AIMessage(content='hello!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='Whats your name?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='My name is GIGACHAT', additional_kwargs={}, response_metadata={})]

Memory

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# message types: user, assistant, system, function, tool
prompt = ChatPromptTemplate.from_messages([
 ("system", "You have a friendly conversation and remember the context."
 "Respond in english."),
 MessagesPlaceholder("history"),
 ("user", "{input}"),
])

chain = prompt | llm

store = {}
def get_history(session_id: str):
 if session_id not in store:
 store[session_id] = InMemoryChatMessageHistory()
 return store[session_id]

chain_with_memory = RunnableWithMessageHistory(
 chain,
 get_session_history=get_history,
 input_messages_key="input",
 history_messages_key="history",
)

sid = "demo-session-123"

resp1 = chain_with_memory.invoke(
 {"input": "Hello. My name is Walter White but everyone call me Heisenberg."},
 config={"configurable": {"session_id": sid}}
)
print(resp1.content)

resp2 = chain_with_memory.invoke(
 {"input": "Say my name."},
 config={"configurable": {"session_id": sid}}
)
print(resp2.content)

output:

Hello, Heisenberg! That's quite a memorable name. Are you a fan of the show, or is there another reason you go by that name?
Heisenberg! You’ve definitely made a name for yourself. What’s on your mind today?

Summary

Summary memory — instead of the full history, passes a condensed summary of previous conversations to the model, which saves tokens and makes it easier to scale long dialogues.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

summarizer_prompt = ChatPromptTemplate.from_messages([
 ("system", "Summarize the following conversation briefly:"),
 ("human", "{conversation}")
])
summarizer = summarizer_prompt | llm | StrOutputParser()

conversation_prompt = ChatPromptTemplate.from_messages([
 ("system", "You are a helpful assistant. Here is the summary of prior conversation:\n{summary}"),
 MessagesPlaceholder("history"),
 ("human", "{input}")
])
conversation_chain = conversation_prompt | llm | StrOutputParser()

store = {}
summaries = {}

def get_history(session_id: str) -> InMemoryChatMessageHistory:
 if session_id not in store:
 store[session_id] = InMemoryChatMessageHistory()
 return store[session_id]

def get_summary(session_id: str) -> str:
 if session_id not in summaries:
 summaries[session_id] = "No previous conversation."
 return summaries[session_id]

def update_summary(session_id: str, threshold: int = 4):
 """Update summary and clear old messages when threshold is reached"""
 history = get_history(session_id)

 if len(history.messages) >= threshold:
 current_summary = summaries.get(session_id, "")

 if current_summary and current_summary != "No previous conversation.":
 conversation_text = f"Previous summary: {current_summary}\n\nRecent messages: {history.messages}"
 else:
 conversation_text = str(history.messages)

 new_summary = summarizer.invoke({"conversation": conversation_text})
 summaries[session_id] = new_summary

 history.clear()

chain_with_memory = RunnableWithMessageHistory(
 conversation_chain,
 get_session_history=get_history,
 input_messages_key="input",
 history_messages_key="history"
)

sid = "demo-summary"

cfg = {"configurable": {"session_id": sid}}
summary = get_summary(sid)
response1 = chain_with_memory.invoke({"input": "Hello! My name is Michael.", "summary": summary}, cfg)
print(response1)

# Update summary periodically (e.g., after every 2-3 exchanges)
update_summary(sid, threshold=4)

summary = get_summary(sid)
response2 = chain_with_memory.invoke({"input": "What is my name?", "summary": summary}, cfg)
print(response2)

update_summary(sid, threshold=4)

output:

Hello, Michał! How can I assist you today?
Your name is Michał. How can I help you today?

That all for this chapter. In the next part I will cover the topic of “chains” in LangChain that gave the library name.

see next chapter

see previous chapter

see the full code from this article in the GitHub repository

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

LLM & AI Agent Applications with LangChain and LangGraph — Part 9 — Conversation Memory

Author(s): Michalzarnecki

1. Full history: the simplest form of memory

2. Memory components: `RunnableWithMessageHistory`

3. Summary-based memory: compressing long conversations

What to pay attention to in practice

Install libraries and load environment variables

History

Memory

Summary

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

Crack ML Interviews with Confidence: K-Nearest Neighbors (KNN 20 Q&A)

The Event-Driven Blueprint: How I Scaled a Spring Boot System to 10 Million Kafka Messages/Day

Building Vector Search? Why FAISS Alone Isn’t Enough

TAI #202: GPT-5.5 Moves Codex Into Real Work

Machine Learning System Design -The Model Serving Triangle, With One Forward Pass Flowing Through Every Trade-off (Part3)

AI Orchestration in Action: How MuleSoft and LLMs Fuel the Future of Enterprise AI

GPT-4 Has 1.8 Trillion Parameters. It Uses 2% of Them Per Token.

Part 20: Data Manipulation in Multi-Dimensional Aggregation

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

LLM & AI Agent Applications with LangChain and LangGraph — Part 9 — Conversation Memory

Author(s): Michalzarnecki

1. Full history: the simplest form of memory

2. Memory components: RunnableWithMessageHistory

3. Summary-based memory: compressing long conversations

What to pay attention to in practice

Install libraries and load environment variables

History

Memory

Summary

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement

2. Memory components: `RunnableWithMessageHistory`