Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Free: 6-day Agentic AI Engineering Email Guide.
Learnings from Towards AI's hands-on work with real clients.
Long-Term vs Short-Term Memory for AI Agents: A Practical Guide Without the Hype
Artificial Intelligence   Latest   Machine Learning

Long-Term vs Short-Term Memory for AI Agents: A Practical Guide Without the Hype

Last Updated on April 10, 2026 by Editorial Team

Author(s): Andrii Tkachuk

Originally published on Towards AI.

Over the past year, memory has become one of the most overused — and misunderstood — concepts in AI agent design.

But before I start, I want to add a few words, most of us building AI agents today didn’t start as “AI engineers”. We come from backend engineering, data engineering, or data science.

That background shapes how we think about systems: scalability, reliability, clear lifecycles, and predictable failure modes.

And when we bring LLMs and agents into production, we still care about the same things:

  • we don’t want state explosions
  • we don’t want hidden coupling
  • and we definitely don’t want to create systems that make life harder for backend engineers and architects down the line.

This article is written from that mindset, not “what sounds impressive in demos”, but what leads to a reasonable trade-off between AI capabilities, backend architecture, and long-term system health.

You hear phrases like long-term memory, short-term memory, context engineering, persistent agents, and stateful conversations everywhere.
But if you look closely at most real implementations, many teams either:

  • don’t actually use memory at all, or
  • use it in ways that introduce serious scalability and reliability issues.

This article aims to cut through the hype and explain, in practical terms, how memory for AI agents actually works, which approaches exist today, and what trade-offs they come with.

Long-Term vs Short-Term Memory for AI Agents: A Practical Guide Without the Hype
Photo by dianne clifford on Unsplash

Before we start!🦾

If this piece gives you something practical you can take into your own system:
👏 leave 50 claps (yes, you can!) — Medium’s algorithm favors this, increasing visibility to others who then discover the article.
🔔 Follow me on Medium and LinkedIn for more deep dives into agentic systems, LLM architecture, and production-grade AI engineering.

First, Let’s Define the Terms Clearly

Long-Term Memory (LTM)

Long-term memory is anything that persists across sessions, restarts, and disconnections (includes the agent’s past behaviors and thoughts that need to be retained and recalled over an extended period of time; this often leverages an external vector store accessible through fast and scalable retrieval to provide relevant information for the agent as needed).

Typical characteristics:

  • Stored in databases, object storage, or vector stores
  • Survives process restarts
  • Not necessarily injected into the model on every request

Common forms of LTM:

  • Full chat history stored in a relational database
  • Events or messages stored in an append-only log
  • Vector embeddings of conversations or summaries
  • User preferences, profiles, or behavioral facts

Think of long-term memory as durable knowledge, not working context.

Short-Term Memory (STM) / Working Memory

Short-term memory (often called working memory or execution state, includes context information about the agent’s current situations; this is typically realized by in-context learning which means it is short and finite due to context window constraints) is:

  • Ephemeral
  • Session-scoped
  • Typically stored in RAM
  • Used during active interaction

In practice, what we call “short-term memory” in agents usually combines:

  • conversational state (messages)
  • execution state (tool outputs, intermediate results)
  • control flow metadata

Short-term memory exists to reduce overhead and improve reasoning continuity, not to replace persistence.

Approach #1 — The Legacy Stateless Approach (Still Very Common)

The most widespread approach today is actually stateless.

How it works

For every user request:

  1. Fetch chat history from a persistent data store
  2. Truncate or limit it
  3. Inject it into the prompt
  4. Run the agent
  5. Repeat on the next request
history = db.load_last_messages(user_id, limit=20)
prompt = build_prompt(history, user_message)
response = llm(prompt)

Pros

  • Extremely simple
  • Easy to reason about
  • No RAM management concerns
  • Works well in serverless environments

Cons

  • Database is hit on every request
  • Context is always injected, even when not needed
  • Hard limits must be enforced aggressively
  • Becomes expensive and slow at scale

This approach does not use short-term memory at all.
Each request is fully independent.

Approach #2 — Short-Term Memory via In-Memory State (LangGraph-Style)

A more advanced approach introduces explicit short-term memory.

This is the model used by frameworks like LangGraph.

Core idea

  • Load long-term memory once
  • Keep a mutable state object in RAM
  • Update it as messages arrive
  • Use it throughout the agent flow
  • Dispose of it when the session ends

Conceptually:

class ChatState(TypedDict):
user_id: str
messages: list[dict]

Typical flow (e.g., with WebSockets or Socket.IO)

SocketIO one of the most common and well-known framework for building chat based applications.

On connect

  • Load chat history from the database
  • Store it in an in-memory state object

On each message

  • Read state from RAM
  • Update messages
  • Run the agent

On disconnect

  • Optionally persist summary
  • Remove state from memory

Pros

  • No database calls on every message
  • Much faster per interaction
  • Natural conversational continuity
  • Clean separation between LTM and STM

Cons (and they are important)

RAM usage grows with:

  • number of concurrent users
  • length of conversations

Requires:

  • strict size limits
  • trimming or summarization
  • TTL / garbage collection

Socket-based systems have edge cases:

  • dropped connections
  • multiple tabs per user
  • missing disconnect events

This approach can be production-ready, but only if memory management is treated as a first-class concern.

Context Variables: What They Are (and What They Are Not)

Many implementations add context variables (for example, ContextVar in Python) to avoid passing state through every function.

Become a Medium member

This is useful — but limited.

Context variables:

  • ✔️ Improve code readability
  • ✔️ Allow access to state “from anywhere” in the execution flow
  • ❌ Do NOT persist state across events
  • ❌ Do NOT replace an in-memory store

They are an access pattern, not a memory strategy.

What context variables are good for

  • Avoiding passing state through dozens of function calls
  • Accessing the current execution state inside deep agent logic
  • Improving code readability
state = get_current_state()
state["messages"].append(new_message)

What they do not do

  • They do not persist memory across events
  • They do not replace an in-memory store
  • They do not solve session lifecycle problems

Context variables are a convenience layer, not a memory system.

Approach #3 — Memory as a Tool (The New Emerging Pattern)

A newer and increasingly popular approach is Memory as a Tool.

Before dismissing this approach as “too complex”, I would strongly recommend trying it at least once.

Even if you don’t end up using Memory as a Tool in production, it forces you to think differently about agent design:

  • how and when an agent decides to fetch information
  • how tool invocation is triggered intentionally rather than implicitly
  • and how much context is actually needed to solve a task.

In practice, this approach is one of the best ways to truly understand:

  • how tool usage works
  • how to guide an agent to call the right tool at the right time
  • and how modern reasoning-based agents operate internally.

Many engineers are familiar with prompts and APIs, but far fewer have hands-on experience with ReAct-style loops or explicit reasoning-driven tool calls. Trying this pattern — even in a small prototype — helps close that gap.

I’ll link a few articles below that go deeper into ReAct, reasoning models, and tool-based agents for those who want to explore this further.

Building AI Agents: Reasoning, Tools, and the Role of MCP

In the era of advanced language models, it’s tempting to treat every LLM as interchangeable — a black box that takes a…

medium.com

Designing AI Agents Like Microservices: A Practical Mental Model for Modern Architectures

If you’ve spent years building microservice architectures and are now staring at the “multi-agent AI” hype wondering…

medium.com

Core idea

Instead of automatically injecting memory, you expose memory retrieval as a tool:

Tool: retrieve_chat_history(user_id, chat_id, offset=0, limit=10)

The agent decides:

  • whether it needs past context
  • how much of it
  • when to fetch more

How it works

  1. Agent receives a user message
  2. Agent reasons whether history is required
  3. Agent calls the retrieval tool
  4. Tool returns paginated history
  5. Agent may repeat if needed

Pros

  • Minimal context injection
  • Lower token usage
  • Memory is fetched only when relevant
  • Excellent fit for modern reasoning models

Cons

  • Requires stronger models
  • Harder to debug
  • More complex agent logic
  • Latency depends on retrieval calls

Critical requirement: Reasoning + ReAct loop

This pattern assumes:

  • a reasoning-capable model
  • multi-step planning
  • tool invocation loops
  • reflection on observations

Without a ReAct-style agent:

  • the model may never request memory
  • or request it too late
  • or request irrelevant parts

This is not a drop-in replacement for traditional memory injection.

When it works best

  • Strong reasoning models
  • Well-designed ReAct prompts
  • Non-critical memory requirements
  • Systems tolerant to occasional mistakes

When it’s risky

  • If history is always required
  • If correctness is critical
  • If missing context causes failure
  • If the model is weak or unstable

With modern multi-step reasoning models, this approach is becoming increasingly viable — and often superior for large-scale systems.

!!! Many people might immediately start talking about security risks and so on, and they will be absolutely right, but let’s figure it out.

Memory used as a tool does introduce security considerations, but not fundamentally different from any other tool.

Prompt injection can attempt to trigger a memory read or write, but it cannot access anything beyond what the memory tool explicitly allows. The real security boundary is not the prompt, but the tool implementation.

This is why memory must:

  • be accessed only via a tool (never auto-injected)
  • be strictly scoped (user_id / tenant_id / namespace)
  • validate both read and write operations
  • explicitly control what can be recalled and how much (so called wrapper should always be)

In practice, prompt injection cannot recall another user’s memory unless there is already a bug in isolation or authorization logic. The larger risk is memory poisoning (writing bad memory), which should be mitigated by validation and write policies inside the tool wrapper, but that’s an absolutely different topic, my folks. If you are interested in this topic, I can also write my observations, experience, and thoughts on this matter.

Choosing the Right Approach

There is no single “correct” memory strategy.

Use hybrid memory, since it integrates both short-term memory and long-term memory to improve an agent’s ability for long-range reasoning and accumulation of experiences.

In practice, hybrid architectures work best:

  • Long-term memory for durability
  • Short-term memory for performance
  • Tool-based retrieval for flexibility

Final Thoughts

Memory is not magic.

Most problems attributed to “lack of memory” are actually:

  • poor context selection
  • uncontrolled state growth
  • missing lifecycle management

Before adding complexity, ask:

  • Does the agent really need this context?
  • For how long?
  • Who is responsible for cleaning it up?

If you answer those questions honestly, the right architecture usually becomes obvious.

And that’s a wrap! If you’ve read this far, it probably means you found this article useful or insightful. If that’s the case, consider leaving a few claps or sharing it with your team, please. Thanks for reading! 🚀

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

6-Day Agentic AI Engineering Email Guide — one practical lesson per day

Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.