AI Has No Memory. So I Built One For It.
Last Updated on May 27, 2026 by Editorial Team
Author(s): Priyanka Mali
Originally published on Towards AI.
AI Has No Memory. So I Built One For It.
Day 7–How AI Memory Actually Works
Here is something that genuinely surprised me when I figured it out.
AI assistants remember what you said three messages ago. They know your name. They follow the thread of a long conversation without losing track.
But here’s the thing — the AI model itself has absolutely no memory.
None. Zero.
Every single time you send a message, the model starts completely fresh. It has no idea who you are, what you said before, or what conversation you are in the middle of.
So how does it appear to remember?
That question sent me down a rabbit hole. And at the end of it, I built something that made the whole thing click. A beginner friendly chatbot which runs entirely on your machine — no API keys, no internet, no cloud — using a tool called Ollama that lets you run powerful open source AI models locally for free.
Let me show you what I built and what it taught me.
The Illusion of Memory
Imagine you wake up every morning with complete amnesia. No memory of yesterday. No memory of anyone you have ever met.
(I promise this is the weirdest analogy I’ll use today — bear with me, it actually makes perfect sense.)
But before you start your day, someone hands you a printed transcript of every conversation you have ever had.
You read it. Now you know everything. You can continue any conversation as if you never forgot.
That is exactly what happens with AI chatbots.
The model wakes up fresh with every message. But the app hands it a full transcript of the conversation before it answers. The model reads it, understands the context, and replies as if it has been paying attention the whole time.
It has not. It just read the notes.
What Actually Gets Sent to the Model
Most people think this is what happens when they send a message:
User: "What is my name?"
↓
AI Model
↓
"Your name is Priyanka."
Here is what actually happens:
System: You are a helpful assistant.
User: Hi, my name is Priyanka.
Assistant: Hi Priyanka! Nice to meet you. How can I help?
User: What is AI?
Assistant: AI stands for Artificial Intelligence...
User: What is my name? ← your actual new message
↓
AI Model
↓
"Your name is Priyanka."
Every single message includes the entire conversation history from the beginning. The model is not remembering — it is re-reading everything from scratch every single time.
This is called the context window — the maximum amount of text the model can see at once. GPT-4 has a 128,000 token context window. Claude has up to 200,000 tokens. Every message, every reply, every system instruction — it all counts towards that limit.
When you hit the limit? The model starts forgetting the oldest parts of the conversation. Not because it got tired. Because there is literally no more room.
The App I Built to Prove This

I wanted to see this in action myself. So I built a simple chatbot in Node.js that runs completely locally — no API keys, no internet connection, no cloud.
“It’s not perfect — here it confused Mistral the wind with Mistral the AI model. But it remembered my name and city across 16 messages. That’s the point.”
What is Ollama?

Before we get into the code — let me explain the tool that makes all of this possible.
Ollama is an open source tool that lets you download and run large language models directly on your own machine. No account. No subscription. No API key. No data sent to any server.
You install it, pull a model, and it runs locally. That’s it.
Think of it like Docker — but for AI models. Instead of pulling a container image, you pull a language model.
Why this matters:
- Free — no token costs, no rate limits, no monthly bill
- Private — your conversations never leave your machine
- Fast — no network latency, no waiting for remote servers
- Educational — you see exactly what’s happening under the hood
Ollama supports many open source models including:
- Llama 3.2 — Meta’s latest open source model, what we use here
- Mistral — a powerful French open source model
- Gemma — Google’s open source model
- Phi — Microsoft’s small but capable model
For this project I used Llama 3.2 — a capable conversational model that runs well on a standard laptop.
Here is what the app does
- You send a message via a POST request
- The app loads the full conversation history from a local JSON file
- It builds a complete prompt — system instructions + all past messages + your new message
- It sends that full prompt to the local Llama model
- Gets the reply, saves the updated history, returns the response
The key is step 3. Every single time. The full history. That is the memory trick.
How the Code Works

Let me walk you through the actual code. The full project is on my GitHub — link at the end.
The project structure:
chatbot-app/
├── server.js ← API routes (chat, history, reset)
├── src/
│ └── chatbot.js ← core logic: loads history, builds prompt, calls Ollama, saves history
└── data/
└── conversation.json ← your conversation is stored here
chatbot.js — The Brain
This is where all the interesting stuff happens.
Step 1 — The system prompt
const SYSTEM_PROMPT = `You are a helpful, friendly assistant.
You remember everything the user has told you in this conversation.
Be concise but warm. If the user tells you something personal like
their name or preferences, remember and use it naturally.`
This is the personality and instructions for the model. It gets included at the very top of every prompt — before any conversation history. Notice it tells the model to “remember everything the user has told you” — but the model doesn’t actually remember anything. We’re about to fake that for it.
Step 2 — Load the history
function loadHistory() {
try {
if (fs.existsSync(CONVERSATION_PATH)) {
const raw = fs.readFileSync(CONVERSATION_PATH, 'utf-8')
if (!raw || raw.trim().length === 0) return { history: [] }
return JSON.parse(raw)
}
return { history: [] }
} catch (err) {
throw new Error(`chatbot: failed to load history — ${err.message}`)
}
}
Every time a new message comes in, we read conversation.json from disk. This is the entire conversation so far — every user message and every assistant reply, in order.
Step 3 — Build the full prompt (this is the key)
function buildPrompt(history) {
let prompt = `SYSTEM: ${SYSTEM_PROMPT}\n\n`
history.forEach(msg => {
if (msg.role === 'user') {
prompt += `User: ${msg.content}\n`
} else {
prompt += `Assistant: ${msg.content}\n`
}
}) prompt += `Assistant:`
return prompt
}
This is the memory trick in 10 lines of code.
We take the system prompt, then loop through every single message in history — every user message, every assistant reply — and stitch it all together into one big string. Then we add Assistant: at the end as a cue for the model to continue.
So what the model actually receives looks like this:
SYSTEM: You are a helpful, friendly assistant...
User: Hi, my name is Priyanka.
Assistant: Hi Priyanka! Great to meet you. How can I help?
User: What is AI?
Assistant: AI stands for Artificial Intelligence...
User: What is my name?
Assistant: ← model continues from here
The model reads it all. Responds. Done.
Step 4 — Send to Ollama and save
async function chat(userMessage) {
const data = loadHistory()
data.history.push({ role: 'user', content: userMessage }) const fullPrompt = buildPrompt(data.history) const response = await fetch('http://localhost:11434/api/generate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: 'llama3.2',
prompt: fullPrompt,
stream: false
})
}) const result = await response.json()
const assistantReply = result.response.trim() data.history.push({ role: 'assistant', content: assistantReply })
saveHistory(data) return assistantReply
}
Notice http://localhost:11434 — that's Ollama running locally on your machine. No cloud. No API key. The model lives on your computer and we talk to it over a local HTTP request.
server.js — The API Layer
const express = require('express')
const { chat, getHistory, resetHistory } = require('./src/chatbot')
const app = express()
app.use(express.json())// Route 1 — send a message
app.post('/chat', async (req, res) => {
const { message } = req.body if (!message || message.trim().length === 0) {
return res.status(400).json({ error: 'No message provided' })
} try {
const reply = await chat(message)
res.json({
message,
reply,
totalMessages: getHistory().length
})
} catch (err) {
res.status(500).json({ error: err.message })
}
})// Route 2 — get full conversation history
app.get('/history', (req, res) => {
const history = getHistory()
if (history.length === 0) {
return res.json({ message: 'No conversation yet', history: [] })
}
res.json({ totalMessages: history.length, history })
})// Route 3 — reset conversation
app.post('/reset', (req, res) => {
resetHistory()
res.json({ message: 'Conversation cleared. Starting fresh!' })
})app.listen(3000, () => {
console.log('Chatbot running on http://localhost:3000')
})
Three clean routes. That’s all you need.
POST /chat — takes a message, returns a reply GET /history — returns the full conversation so far POST /reset — clears the JSON file and starts fresh
The error handling is worth noticing — if Ollama isn’t running, chat() throws a clear error: "could not reach Ollama — is it running?" Small detail but makes debugging much easier.
How to Run It Yourself

Step 1 — Install Ollama Download from ollama.com and install it. This is what runs the AI model locally on your machine.
Step 2 — Pull the model
ollama pull llama3.2
This downloads Llama 3.2 — a powerful open source model from Meta. About 2GB. Free.
Step 3 — Clone the repo and install
git clone https://github.com/PriyankaMali-13/AI
cd chatbot-app
npm install
Step 4 — Start the server
node server.js
Server runs on http://localhost:3000
Step 5 — Send your first message Open Postman or any API client and send:
POST http://localhost:3000/chat
{ "message": "Hi, my name is XYZ" }
Then send a follow up:
POST http://localhost:3000/chat
{ "message": "What is my name?" }
It will know. Not because it remembered. Because you just showed it the transcript.
What This Teaches You About LLMs
Building this small app taught me more about how LLMs work than any article I read.
Three things that hit differently after building this:
1. Context is everything The quality of the AI’s response depends entirely on what you put in the context. Better history management = better responses. This is why prompt engineering matters so much.
2. Memory is an illusion the app creates ChatGPT, Claude, Gemini, every AI assistant — they all do some version of this. The difference is scale, sophistication, and how they manage what goes in and out of the context window.
3. Local AI is more powerful than most people realise Running Llama 3.2 on my own machine — free, private, no rate limits — felt like a superpower. For learning and prototyping, you don’t need expensive API calls.
The Limitation — And What Comes Next
This approach works beautifully for short conversations. But it has a ceiling.
Every message adds more tokens to the context. Eventually you hit the model’s limit. At that point the oldest messages start getting cut off — and the “memory” starts to fail.
The solution to this problem is something called RAG — Retrieval Augmented Generation. Instead of stuffing everything into the context, you store the history in a vector database and only retrieve the most relevant parts when needed.
That is coming up in a future post. But first you need to understand what we just built — because RAG is just a smarter version of exactly this.
What I Learned Today
- AI models have zero memory — they start fresh with every single message
- The illusion of memory comes from the app including full conversation history in every prompt
- This is called context window management — and every chatbot app does some version of it
- The context window has a limit — go over it and the model starts forgetting
- You can run powerful AI models completely locally for free using Ollama
- Building something small teaches you more than reading about it ever will
The Code
Full project on GitHub: github.com/PriyankaMali-13/AI/tree/master/chatbot-app
Clone it, run it, break it, rebuild it. That is the best way to learn.
Written by Priyanka. AI tools were used in the research and writing process. All code, ideas, and opinions are my own.
#365DaysOfAI #NodeJS #Ollama #LLM #Chatbot #AI #LearningInPublic #ArtificialIntelligence
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Towards AI Academy
We Build Enterprise-Grade AI. We'll Teach You to Master It Too.
15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.
Start free — no commitment:
→ 6-Day Agentic AI Engineering Email Guide — one practical lesson per day
→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages
Our courses:
→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.
→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.
→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.
Note: Article content contains the views of the contributing authors and not Towards AI.