Long-Term vs Short-Term Memory for AI Agents: A Practical Guide Without the Hype

Last Updated on April 10, 2026 by Editorial Team

Author(s): Andrii Tkachuk

Originally published on Towards AI.

Over the past year, memory has become one of the most overused — and misunderstood — concepts in AI agent design.

But before I start, I want to add a few words, most of us building AI agents today didn’t start as “AI engineers”. We come from backend engineering, data engineering, or data science.

That background shapes how we think about systems: scalability, reliability, clear lifecycles, and predictable failure modes.

And when we bring LLMs and agents into production, we still care about the same things:

we don’t want state explosions
we don’t want hidden coupling
and we definitely don’t want to create systems that make life harder for backend engineers and architects down the line.

This article is written from that mindset, not “what sounds impressive in demos”, but what leads to a reasonable trade-off between AI capabilities, backend architecture, and long-term system health.

You hear phrases like long-term memory, short-term memory, context engineering, persistent agents, and stateful conversations everywhere.
But if you look closely at most real implementations, many teams either:

don’t actually use memory at all, or
use it in ways that introduce serious scalability and reliability issues.

This article aims to cut through the hype and explain, in practical terms, how memory for AI agents actually works, which approaches exist today, and what trade-offs they come with.

Long-Term vs Short-Term Memory for AI Agents: A Practical Guide Without the Hype — Photo by dianne clifford on Unsplash

Before we start!🦾

If this piece gives you something practical you can take into your own system:
👏 leave 50 claps (yes, you can!) — Medium’s algorithm favors this, increasing visibility to others who then discover the article.
🔔 Follow me on Medium and LinkedIn for more deep dives into agentic systems, LLM architecture, and production-grade AI engineering.

First, Let’s Define the Terms Clearly

Long-Term Memory (LTM)

Long-term memory is anything that persists across sessions, restarts, and disconnections (includes the agent’s past behaviors and thoughts that need to be retained and recalled over an extended period of time; this often leverages an external vector store accessible through fast and scalable retrieval to provide relevant information for the agent as needed).

Typical characteristics:

Stored in databases, object storage, or vector stores
Survives process restarts
Not necessarily injected into the model on every request

Common forms of LTM:

Full chat history stored in a relational database
Events or messages stored in an append-only log
Vector embeddings of conversations or summaries
User preferences, profiles, or behavioral facts

Think of long-term memory as durable knowledge, not working context.

Short-Term Memory (STM) / Working Memory

Short-term memory (often called working memory or execution state, includes context information about the agent’s current situations; this is typically realized by in-context learning which means it is short and finite due to context window constraints) is:

Ephemeral
Session-scoped
Typically stored in RAM
Used during active interaction

In practice, what we call “short-term memory” in agents usually combines:

conversational state (messages)
execution state (tool outputs, intermediate results)
control flow metadata

Short-term memory exists to reduce overhead and improve reasoning continuity, not to replace persistence.

Approach #1 — The Legacy Stateless Approach (Still Very Common)

The most widespread approach today is actually stateless.

How it works

For every user request:

Fetch chat history from a persistent data store
Truncate or limit it
Inject it into the prompt
Run the agent
Repeat on the next request

history = db.load_last_messages(user_id, limit=20)
prompt = build_prompt(history, user_message)
response = llm(prompt)

Pros

Extremely simple
Easy to reason about
No RAM management concerns
Works well in serverless environments

Cons

Database is hit on every request
Context is always injected, even when not needed
Hard limits must be enforced aggressively
Becomes expensive and slow at scale

This approach does not use short-term memory at all.
Each request is fully independent.

Approach #2 — Short-Term Memory via In-Memory State (LangGraph-Style)

A more advanced approach introduces explicit short-term memory.

This is the model used by frameworks like LangGraph.

Core idea

Load long-term memory once
Keep a mutable state object in RAM
Update it as messages arrive
Use it throughout the agent flow
Dispose of it when the session ends

Conceptually:

class ChatState(TypedDict):
 user_id: str
 messages: list[dict]

Typical flow (e.g., with WebSockets or Socket.IO)

SocketIO one of the most common and well-known framework for building chat based applications.

On connect

Load chat history from the database
Store it in an in-memory state object

On each message

Read state from RAM
Update messages
Run the agent

On disconnect

Optionally persist summary
Remove state from memory

Pros

No database calls on every message
Much faster per interaction
Natural conversational continuity
Clean separation between LTM and STM

Cons (and they are important)

RAM usage grows with:

number of concurrent users
length of conversations

Requires:

strict size limits
trimming or summarization
TTL / garbage collection

Socket-based systems have edge cases:

dropped connections
multiple tabs per user
missing disconnect events

This approach can be production-ready, but only if memory management is treated as a first-class concern.

Context Variables: What They Are (and What They Are Not)

Many implementations add context variables (for example, ContextVar in Python) to avoid passing state through every function.

This is useful — but limited.

Context variables:

✔️ Improve code readability
✔️ Allow access to state “from anywhere” in the execution flow
❌ Do NOT persist state across events
❌ Do NOT replace an in-memory store

They are an access pattern, not a memory strategy.

What context variables are good for

Avoiding passing state through dozens of function calls
Accessing the current execution state inside deep agent logic
Improving code readability

state = get_current_state()
state["messages"].append(new_message)

What they do not do

They do not persist memory across events
They do not replace an in-memory store
They do not solve session lifecycle problems

Context variables are a convenience layer, not a memory system.

Approach #3 — Memory as a Tool (The New Emerging Pattern)

A newer and increasingly popular approach is Memory as a Tool.

Before dismissing this approach as “too complex”, I would strongly recommend trying it at least once.

Even if you don’t end up using Memory as a Tool in production, it forces you to think differently about agent design:

how and when an agent decides to fetch information
how tool invocation is triggered intentionally rather than implicitly
and how much context is actually needed to solve a task.

In practice, this approach is one of the best ways to truly understand:

how tool usage works
how to guide an agent to call the right tool at the right time
and how modern reasoning-based agents operate internally.

Many engineers are familiar with prompts and APIs, but far fewer have hands-on experience with ReAct-style loops or explicit reasoning-driven tool calls. Trying this pattern — even in a small prototype — helps close that gap.

I’ll link a few articles below that go deeper into ReAct, reasoning models, and tool-based agents for those who want to explore this further.

Building AI Agents: Reasoning, Tools, and the Role of MCP

In the era of advanced language models, it’s tempting to treat every LLM as interchangeable — a black box that takes a…

medium.com

Designing AI Agents Like Microservices: A Practical Mental Model for Modern Architectures

If you’ve spent years building microservice architectures and are now staring at the “multi-agent AI” hype wondering…

medium.com

Core idea

Instead of automatically injecting memory, you expose memory retrieval as a tool:

Tool: retrieve_chat_history(user_id, chat_id, offset=0, limit=10)

The agent decides:

whether it needs past context
how much of it
when to fetch more

How it works

Agent receives a user message
Agent reasons whether history is required
Agent calls the retrieval tool
Tool returns paginated history
Agent may repeat if needed

Pros

Minimal context injection
Lower token usage
Memory is fetched only when relevant
Excellent fit for modern reasoning models

Cons

Requires stronger models
Harder to debug
More complex agent logic
Latency depends on retrieval calls

Critical requirement: Reasoning + ReAct loop

This pattern assumes:

a reasoning-capable model
multi-step planning
tool invocation loops
reflection on observations

Without a ReAct-style agent:

the model may never request memory
or request it too late
or request irrelevant parts

This is not a drop-in replacement for traditional memory injection.

When it works best

Strong reasoning models
Well-designed ReAct prompts
Non-critical memory requirements
Systems tolerant to occasional mistakes

When it’s risky

If history is always required
If correctness is critical
If missing context causes failure
If the model is weak or unstable

With modern multi-step reasoning models, this approach is becoming increasingly viable — and often superior for large-scale systems.

!!! Many people might immediately start talking about security risks and so on, and they will be absolutely right, but let’s figure it out.

Memory used as a tool does introduce security considerations, but not fundamentally different from any other tool.

Prompt injection can attempt to trigger a memory read or write, but it cannot access anything beyond what the memory tool explicitly allows. The real security boundary is not the prompt, but the tool implementation.

This is why memory must:

be accessed only via a tool (never auto-injected)
be strictly scoped (user_id / tenant_id / namespace)
validate both read and write operations
explicitly control what can be recalled and how much (so called wrapper should always be)

In practice, prompt injection cannot recall another user’s memory unless there is already a bug in isolation or authorization logic. The larger risk is memory poisoning (writing bad memory), which should be mitigated by validation and write policies inside the tool wrapper, but that’s an absolutely different topic, my folks. If you are interested in this topic, I can also write my observations, experience, and thoughts on this matter.

Choosing the Right Approach

There is no single “correct” memory strategy.

Use hybrid memory, since it integrates both short-term memory and long-term memory to improve an agent’s ability for long-range reasoning and accumulation of experiences.

In practice, hybrid architectures work best:

Long-term memory for durability
Short-term memory for performance
Tool-based retrieval for flexibility

Final Thoughts

Memory is not magic.

Most problems attributed to “lack of memory” are actually:

poor context selection
uncontrolled state growth
missing lifecycle management

Before adding complexity, ask:

Does the agent really need this context?
For how long?
Who is responsible for cleaning it up?

If you answer those questions honestly, the right architecture usually becomes obvious.

And that’s a wrap! If you’ve read this far, it probably means you found this article useful or insightful. If that’s the case, consider leaving a few claps or sharing it with your team, please. Thanks for reading! 🚀

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Long-Term vs Short-Term Memory for AI Agents: A Practical Guide Without the Hype

Author(s): Andrii Tkachuk

Before we start!🦾

First, Let’s Define the Terms Clearly

Long-Term Memory (LTM)

Short-Term Memory (STM) / Working Memory

Approach #1 — The Legacy Stateless Approach (Still Very Common)

How it works

Pros

Cons

Approach #2 — Short-Term Memory via In-Memory State (LangGraph-Style)

Core idea

Typical flow (e.g., with WebSockets or Socket.IO)

Pros

Cons (and they are important)

Context Variables: What They Are (and What They Are Not)

What context variables are good for

What they do not do

Approach #3 — Memory as a Tool (The New Emerging Pattern)

Building AI Agents: Reasoning, Tools, and the Role of MCP

In the era of advanced language models, it’s tempting to treat every LLM as interchangeable — a black box that takes a…

Designing AI Agents Like Microservices: A Practical Mental Model for Modern Architectures

If you’ve spent years building microservice architectures and are now staring at the “multi-agent AI” hype wondering…

Core idea

How it works

Pros

Cons

Critical requirement: Reasoning + ReAct loop

When it works best

When it’s risky

Choosing the Right Approach

Final Thoughts

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement