AI Has No Memory. So I Built One For It.

Last Updated on May 27, 2026 by Editorial Team

Author(s): Priyanka Mali

Originally published on Towards AI.

AI Has No Memory. So I Built One For It.

Day 7–How AI Memory Actually Works

Here is something that genuinely surprised me when I figured it out.

AI assistants remember what you said three messages ago. They know your name. They follow the thread of a long conversation without losing track.

But here’s the thing — the AI model itself has absolutely no memory.

None. Zero.

Every single time you send a message, the model starts completely fresh. It has no idea who you are, what you said before, or what conversation you are in the middle of.

So how does it appear to remember?

That question sent me down a rabbit hole. And at the end of it, I built something that made the whole thing click. A beginner friendly chatbot which runs entirely on your machine — no API keys, no internet, no cloud — using a tool called Ollama that lets you run powerful open source AI models locally for free.

Let me show you what I built and what it taught me.

The Illusion of Memory

AI Has No Memory. So I Built One For It. — The illusion of memory — it’s not magic, it’s context | Source: Unsplash

Imagine you wake up every morning with complete amnesia. No memory of yesterday. No memory of anyone you have ever met.

(I promise this is the weirdest analogy I’ll use today — bear with me, it actually makes perfect sense.)

But before you start your day, someone hands you a printed transcript of every conversation you have ever had.

You read it. Now you know everything. You can continue any conversation as if you never forgot.

That is exactly what happens with AI chatbots.

The model wakes up fresh with every message. But the app hands it a full transcript of the conversation before it answers. The model reads it, understands the context, and replies as if it has been paying attention the whole time.

It has not. It just read the notes.

What Actually Gets Sent to the Model

Most people think this is what happens when they send a message:

User: "What is my name?"
 ↓
 AI Model
 ↓
"Your name is Priyanka."

Here is what actually happens:

System: You are a helpful assistant.

User: Hi, my name is Priyanka.
Assistant: Hi Priyanka! Nice to meet you. How can I help?
User: What is AI?
Assistant: AI stands for Artificial Intelligence...
User: What is my name? ← your actual new message
 ↓
 AI Model
 ↓
"Your name is Priyanka."

Every single message includes the entire conversation history from the beginning. The model is not remembering — it is re-reading everything from scratch every single time.

This is called the context window — the maximum amount of text the model can see at once. GPT-4 has a 128,000 token context window. Claude has up to 200,000 tokens. Every message, every reply, every system instruction — it all counts towards that limit.

When you hit the limit? The model starts forgetting the oldest parts of the conversation. Not because it got tired. Because there is literally no more room.

The App I Built to Prove This

POST /chat in action — the app remembering across messages. | Screenshot: Author’s own

I wanted to see this in action myself. So I built a simple chatbot in Node.js that runs completely locally — no API keys, no internet connection, no cloud.

“It’s not perfect — here it confused Mistral the wind with Mistral the AI model. But it remembered my name and city across 16 messages. That’s the point.”

What is Ollama?

Ollama — the easiest way to run AI models locally. Free, private, no API key needed. | Source: ollama.com

Before we get into the code — let me explain the tool that makes all of this possible.

Ollama is an open source tool that lets you download and run large language models directly on your own machine. No account. No subscription. No API key. No data sent to any server.

You install it, pull a model, and it runs locally. That’s it.

Think of it like Docker — but for AI models. Instead of pulling a container image, you pull a language model.

Why this matters:

Free — no token costs, no rate limits, no monthly bill
Private — your conversations never leave your machine
Fast — no network latency, no waiting for remote servers
Educational — you see exactly what’s happening under the hood

Ollama supports many open source models including:

Llama 3.2 — Meta’s latest open source model, what we use here
Mistral — a powerful French open source model
Gemma — Google’s open source model
Phi — Microsoft’s small but capable model

For this project I used Llama 3.2 — a capable conversational model that runs well on a standard laptop.

Here is what the app does

You send a message via a POST request
The app loads the full conversation history from a local JSON file
It builds a complete prompt — system instructions + all past messages + your new message
It sends that full prompt to the local Llama model
Gets the reply, saves the updated history, returns the response

The key is step 3. Every single time. The full history. That is the memory trick.

How the Code Works

The buildPrompt function — where the memory trick actually happens. | Image: AI Generated

Let me walk you through the actual code. The full project is on my GitHub — link at the end.

The project structure:

chatbot-app/
├── server.js ← API routes (chat, history, reset)
├── src/
│ └── chatbot.js ← core logic: loads history, builds prompt, calls Ollama, saves history
└── data/
 └── conversation.json ← your conversation is stored here

chatbot.js — The Brain

This is where all the interesting stuff happens.

Step 1 — The system prompt

const SYSTEM_PROMPT = `You are a helpful, friendly assistant. 
You remember everything the user has told you in this conversation.
Be concise but warm. If the user tells you something personal like 
their name or preferences, remember and use it naturally.`

This is the personality and instructions for the model. It gets included at the very top of every prompt — before any conversation history. Notice it tells the model to “remember everything the user has told you” — but the model doesn’t actually remember anything. We’re about to fake that for it.

Step 2 — Load the history

function loadHistory() {
 try {
 if (fs.existsSync(CONVERSATION_PATH)) {
 const raw = fs.readFileSync(CONVERSATION_PATH, 'utf-8')
 if (!raw || raw.trim().length === 0) return { history: [] }
 return JSON.parse(raw)
 }
 return { history: [] }
 } catch (err) {
 throw new Error(`chatbot: failed to load history — ${err.message}`)
 }
}

Every time a new message comes in, we read conversation.json from disk. This is the entire conversation so far — every user message and every assistant reply, in order.

Step 3 — Build the full prompt (this is the key)

function buildPrompt(history) {
 let prompt = `SYSTEM: ${SYSTEM_PROMPT}\n\n`

 history.forEach(msg => {
 if (msg.role === 'user') {
 prompt += `User: ${msg.content}\n`
 } else {
 prompt += `Assistant: ${msg.content}\n`
 }
 }) prompt += `Assistant:`
 return prompt
}

This is the memory trick in 10 lines of code.

We take the system prompt, then loop through every single message in history — every user message, every assistant reply — and stitch it all together into one big string. Then we add Assistant: at the end as a cue for the model to continue.

So what the model actually receives looks like this:

SYSTEM: You are a helpful, friendly assistant...

User: Hi, my name is Priyanka.
Assistant: Hi Priyanka! Great to meet you. How can I help?
User: What is AI?
Assistant: AI stands for Artificial Intelligence...
User: What is my name?
Assistant: ← model continues from here

The model reads it all. Responds. Done.

Step 4 — Send to Ollama and save

async function chat(userMessage) {
 const data = loadHistory()

 data.history.push({ role: 'user', content: userMessage }) const fullPrompt = buildPrompt(data.history) const response = await fetch('http://localhost:11434/api/generate', {
 method: 'POST',
 headers: { 'Content-Type': 'application/json' },
 body: JSON.stringify({
 model: 'llama3.2',
 prompt: fullPrompt,
 stream: false
 })
 }) const result = await response.json()
 const assistantReply = result.response.trim() data.history.push({ role: 'assistant', content: assistantReply })
 saveHistory(data) return assistantReply
}

Notice http://localhost:11434 — that's Ollama running locally on your machine. No cloud. No API key. The model lives on your computer and we talk to it over a local HTTP request.

server.js — The API Layer

const express = require('express')
const { chat, getHistory, resetHistory } = require('./src/chatbot')

const app = express()
app.use(express.json())// Route 1 — send a message
app.post('/chat', async (req, res) => {
 const { message } = req.body if (!message || message.trim().length === 0) {
 return res.status(400).json({ error: 'No message provided' })
 } try {
 const reply = await chat(message)
 res.json({
 message,
 reply,
 totalMessages: getHistory().length
 })
 } catch (err) {
 res.status(500).json({ error: err.message })
 }
})// Route 2 — get full conversation history
app.get('/history', (req, res) => {
 const history = getHistory()
 if (history.length === 0) {
 return res.json({ message: 'No conversation yet', history: [] })
 }
 res.json({ totalMessages: history.length, history })
})// Route 3 — reset conversation
app.post('/reset', (req, res) => {
 resetHistory()
 res.json({ message: 'Conversation cleared. Starting fresh!' })
})app.listen(3000, () => {
 console.log('Chatbot running on http://localhost:3000')
})

Three clean routes. That’s all you need.

POST /chat — takes a message, returns a reply GET /history — returns the full conversation so far POST /reset — clears the JSON file and starts fresh

The error handling is worth noticing — if Ollama isn’t running, chat() throws a clear error: "could not reach Ollama — is it running?" Small detail but makes debugging much easier.

How to Run It Yourself

node server.js — your chatbot is live. | Image: AI Generated

Step 1 — Install Ollama Download from ollama.com and install it. This is what runs the AI model locally on your machine.

Step 2 — Pull the model

ollama pull llama3.2

This downloads Llama 3.2 — a powerful open source model from Meta. About 2GB. Free.

Step 3 — Clone the repo and install

git clone https://github.com/PriyankaMali-13/AI
cd chatbot-app
npm install

Step 4 — Start the server

node server.js

Server runs on http://localhost:3000

Step 5 — Send your first message Open Postman or any API client and send:

POST http://localhost:3000/chat
{ "message": "Hi, my name is XYZ" }

Then send a follow up:

POST http://localhost:3000/chat
{ "message": "What is my name?" }

It will know. Not because it remembered. Because you just showed it the transcript.

What This Teaches You About LLMs

Building this small app taught me more about how LLMs work than any article I read.

Three things that hit differently after building this:

1. Context is everything The quality of the AI’s response depends entirely on what you put in the context. Better history management = better responses. This is why prompt engineering matters so much.

2. Memory is an illusion the app creates ChatGPT, Claude, Gemini, every AI assistant — they all do some version of this. The difference is scale, sophistication, and how they manage what goes in and out of the context window.

3. Local AI is more powerful than most people realise Running Llama 3.2 on my own machine — free, private, no rate limits — felt like a superpower. For learning and prototyping, you don’t need expensive API calls.

The Limitation — And What Comes Next

This approach works beautifully for short conversations. But it has a ceiling.

Every message adds more tokens to the context. Eventually you hit the model’s limit. At that point the oldest messages start getting cut off — and the “memory” starts to fail.

The solution to this problem is something called RAG — Retrieval Augmented Generation. Instead of stuffing everything into the context, you store the history in a vector database and only retrieve the most relevant parts when needed.

That is coming up in a future post. But first you need to understand what we just built — because RAG is just a smarter version of exactly this.

What I Learned Today

AI models have zero memory — they start fresh with every single message
The illusion of memory comes from the app including full conversation history in every prompt
This is called context window management — and every chatbot app does some version of it
The context window has a limit — go over it and the model starts forgetting
You can run powerful AI models completely locally for free using Ollama
Building something small teaches you more than reading about it ever will

The Code

Full project on GitHub: github.com/PriyankaMali-13/AI/tree/master/chatbot-app

Clone it, run it, break it, rebuild it. That is the best way to learn.

Written by Priyanka. AI tools were used in the research and writing process. All code, ideas, and opinions are my own.

#365DaysOfAI #NodeJS #Ollama #LLM #Chatbot #AI #LearningInPublic #ArtificialIntelligence

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

AI Has No Memory. So I Built One For It.

Author(s): Priyanka Mali

AI Has No Memory. So I Built One For It.

Day 7–How AI Memory Actually Works

The Illusion of Memory

What Actually Gets Sent to the Model

The App I Built to Prove This

What is Ollama?

Here is what the app does

How the Code Works

chatbot.js — The Brain

server.js — The API Layer

How to Run It Yourself

What This Teaches You About LLMs

The Limitation — And What Comes Next

What I Learned Today

The Code

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

Full-Stack Data Scientists for the Agentic Coding World

Building Production-Grade AI Skills with Snowflake Cortex AI Function Studio

I Tried 10 AI Agent Frameworks in 2026 — Here’s the Honest Guide I Wish I Had Earlier

How One Spring Boot Optimization Saved Our Startup $30,000 a Year

Inside Palantir AIP: How the World’s Most Controversial AI Platform Actually Works

What Is a Reverse Proxy? (And Why Every Backend Developer Should Care)

What Claude Opus 4.8 Actually Changes If You’re Building Agents

QWEN 3.7 Max Worked For 35 Hrs Straight And The Results Were Mind-blowing

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

AI Has No Memory. So I Built One For It.

Author(s): Priyanka Mali

AI Has No Memory. So I Built One For It.

Day 7–How AI Memory Actually Works

The Illusion of Memory

What Actually Gets Sent to the Model

The App I Built to Prove This

What is Ollama?

Here is what the app does

How the Code Works

chatbot.js — The Brain

server.js — The API Layer

How to Run It Yourself

What This Teaches You About LLMs

The Limitation — And What Comes Next

What I Learned Today

The Code

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement