Stop Your AI From Lying. Build RAG.

Last Updated on May 29, 2026 by Editorial Team

Author(s): Priyanka Mali

Originally published on Towards AI.

Day 9 — No vector database. No API. Just Ollama and a JSON file.

A few weeks ago, a colleague stopped me mid conversation.

“You keep learning all this AI stuff,” she said. “But tell me — if I ask your chatbot what your favourite colour is, will it know?”

I went quiet.

I knew how to build a chatbot. I had built one. But her question exposed a gap I hadn’t fully thought through. The chatbot I built could talk. It could remember our conversation. But it only knew what the AI model already knew — which is everything on the internet up to a certain date, and absolutely nothing about me.

That question sent me digging. And what I found was RAG.

First — Why Do LLMs Fail at Personal Questions?

Stop Your AI From Lying. Build RAG. — LLMs know everything on the internet — and nothing about you. | Photo: Unsplash

Large language models are trained on massive amounts of public text. Books, websites, Wikipedia, research papers. They are incredibly good at answering general questions.

Ask an LLM why the sky is blue — perfect answer.

Ask it what you ate for breakfast — it will either say “I don’t know” or worse, confidently make something up. That confident wrong answer is what we call hallucination. And it happens because the model has never seen your personal data. It is just guessing based on patterns.

This is the core problem:

LLMs are frozen in time — trained up to a cutoff date
They know nothing about your private documents
They cannot access your company data
They make up answers when they don’t know — and sound confident doing it

This is exactly what my colleague was pointing at. And for anyone building real AI products — this is the biggest wall you hit.

RAG is the solution.

What is RAG?

RAG stands for Retrieval Augmented Generation.

I know — the name sounds intimidating. But once you break it down, it is one of the most elegant ideas in AI.

Let me start with just the G — Generation.

This is what LLMs already do. You ask a question, the model generates an answer. That part you already understand.

Now add R and A — Retrieval Augmented.

Before the model generates an answer, you first retrieve the most relevant information from your own data. Then you augment the model’s input with that information. The model now generates an answer based on what you gave it — not just what it was trained on.

Think of it like this. Imagine you are a new employee on your first day. A customer calls and asks about the refund policy. You don’t know it yet — you just started. So you quickly search the company handbook, find the relevant section, read it, and answer the customer based on what you just read.

That is RAG. You are the LLM. The handbook is your document. The search is retrieval. The answer you give is generation — augmented by what you retrieved.

How RAG Actually Works — Step by Step

The RAG pipeline — from document to answer. | Image: AI Generated

Here is the full flow — broken into simple steps.

Step 1 — Ingest your document

You paste your document into the system. This could be a PDF, a text file, a company policy — anything with text.

Step 2 — Chunk it

The document gets split into smaller pieces called chunks. Not the whole document at once — just paragraphs or sections. Why? Because you don’t want to send the entire document to the LLM every time. You only want to send the relevant part.

Step 3 — Convert chunks to vectors (embeddings)

Each chunk gets converted into a list of numbers called a vector or embedding. These numbers capture the meaning of the text — not just the words.

A key thing to know: in our app, each chunk becomes exactly 768 numbers regardless of how long or short the text is. That fixed size is what makes comparison possible.

Step 4 — Store in a vector database

These vectors get saved — in our case to a simple JSON file. This is your vector database. Each entry stores the original chunk text and its corresponding vector.

Step 5 — User asks a question

When someone asks a question, the same embedding process runs on the question. The question becomes a vector too.

Step 6 — Find the closest match (cosine similarity)

The system compares the question vector against all the stored chunk vectors. It finds the chunks whose meaning is closest to the question. This is called cosine similarity — a mathematical way of measuring how similar two vectors are.

Step 7 — Send to the LLM

The most relevant chunks get sent to the LLM along with the question. The LLM reads those chunks and generates an answer based on them — not from its general training.

Step 8 — You get your answer

Grounded. Specific. From your data.

The App I Built

Asked my RAG app ‘What is my favourite colour?’ — it found the answer from my document. | Screenshot: Author’s own

After understanding RAG conceptually I sat down and built one.

No paid APIs. No external vector database. No cloud. Runs completely on my machine using Ollama — same tool I used in Day 7.

Here is what it does:

You paste a document via POST /ingest
It chunks the document, converts to vectors, saves to db.json
You ask a question via POST /ask
It finds the most relevant chunks using cosine similarity
Sends them to Llama 3.2 with a custom prompt
Returns an answer strictly from your document

The project structure:

rag-app/
├── server.js ← all API routes
├── src/
│ ├── chunker.js ← splits doc into chunks
│ ├── embedder.js ← converts text to vectors via Ollama
│ ├── vectorDB.js ← stores vectors, does cosine similarity
│ ├── retriever.js ← combines chunker + embedder + vectorDB
│ └── llm.js ← prompt engineering + gets answer from llama3.2
└── data/
 └── db.json ← your vector database

Five files. One JSON file as the database. That is a complete RAG system.

The Code — What Each File Does

chunker.js — Split the document

Takes your raw text and splits it into smaller chunks. Each chunk is a paragraph or section — small enough to be meaningful, large enough to have context.

Here is the most interesting line in the whole file:

i += chunkSize - overlap

This one line is why the chunker works well. Each chunk overlaps with the previous one by 40 words. Why? Because if a sentence starts at the end of chunk 1 and finishes at the start of chunk 2 — without overlap, that sentence gets cut in half and loses its meaning. The overlap makes sure context never gets lost at chunk boundaries.

Most tutorials skip this detail. It is one of those small decisions that makes a real difference in answer quality.

embedder.js — Convert text to vectors

Calls Ollama’s embedding model to convert any text into a list of 768 numbers. Here is the actual call:

body: JSON.stringify({
 model: 'nomic-embed-text',
 prompt: text
})

One thing worth noticing — this uses nomic-embed-text, not llama3.2. Two completely different models working together. The embedding model converts text to numbers. The LLM generates the final answer. Neither does the other's job.

Most people don’t realise RAG needs two models. This is why.

vectorDB.js — The cosine similarity

This is the mathematical heart of RAG. When you ask a question, the system compares your question vector against every stored chunk vector. The function that does this comparison is cosine similarity:

function cosineSimilarity(vecA, vecB) {
 let dot = 0, magA = 0, magB = 0
 for (let i = 0; i < vecA.length; i++) {
 dot += vecA[i] * vecB[i]
 magA += vecA[i] * vecA[i]
 magB += vecB[i] * vecB[i]
 }
 return dot / (Math.sqrt(magA) * Math.sqrt(magB))
}

It returns a number between 0 and 1. The closer to 1, the more similar the meaning. The top 3 scoring chunks get sent to the LLM.

You don’t need to understand the maths to use it. But it helps to know this is what “finding the closest match” actually means under the hood.

llm.js — The prompt engineering that prevents hallucination

This is my favourite part of the whole app. The prompt we send to the LLM:

const prompt = `You are a helpful assistant that answers 
questions strictly from the provided document context.

Rules:
1. Answer ONLY from the context provided below
2. If the answer is not in the context, say 
 "I could not find that in the document"
3. Be concise and clear
4. Mention which excerpt your answer came from`

Rule 2 is everything.

Instead of letting the model guess when it doesn’t know, we explicitly tell it to say “I could not find that in the document.” That single instruction is what stops hallucination. The model is not allowed to go outside the provided context.

This is prompt engineering doing real, practical work. Not fancy — just precise.

How to Run It

Step 1 — Install Ollama Download from ollama.com

Step 2 — Pull both models

ollama pull nomic-embed-text
ollama pull llama3.2

Step 3 — Clone and install

git clone https://github.com/PriyankaMali-13/AI
cd rag-app
npm install

Step 4 — Start the server

node server.js

Step 5 — Ingest your document

POST /ingest
{ "text": "your document text here..." }

Step 6 — Ask your question

POST /ask
{ "question": "What is deep learning?" }

Important — always call /ingest before /ask. Otherwise:

{ "error": "No document ingested yet. Please call /ingest first." }

What Makes This Different From a Regular Chatbot

Going back to my colleague’s question — “will it know my favourite colour?”

With a regular chatbot — no. The model has never seen that information.

With RAG — yes. If you give it a document that contains your preferences, it will find the relevant section and answer from it. The model isn’t guessing. It is reading.

The other big difference: no hallucination.

When the LLM is told to answer strictly from the provided context, it says “I don’t have that information” instead of making something up. That is a fundamental shift in reliability.

This is why most enterprise and production AI systems use RAG. They need answers from verified sources — their own data — not from whatever the model learned from the internet.

One Thing I Found Surprising

I expected building RAG to be complicated. It’s not.

The hard part is understanding the concepts — chunking, embeddings, cosine similarity. Once you understand those, the code is straightforward. Five files. A JSON file. Two Ollama models.

The most powerful AI technique in production today can be built by a beginner in an afternoon.

That genuinely surprised me.

What I Learned Today

RAG = Retrieval Augmented Generation — give the LLM your data at query time
LLMs hallucinate because they don’t know your data — RAG fixes this
The pipeline: ingest → chunk → embed → store → retrieve → generate
Each chunk becomes a fixed 768-dimensional vector regardless of text length
Cosine similarity finds the closest matching chunk to your question
You don’t need a paid vector database — a JSON file works for learning
Two Ollama models work together: nomic-embed-text for embeddings, llama3.2 for answers
Most enterprise AI systems use RAG — it’s the standard for grounded, reliable answers

The Code

Full project on GitHub: github.com/PriyankaMali-13/AI/tree/master/rag-app

Clone it, paste a document, ask questions. See RAG working with your own eyes.

I’m Priyanka — backend engineer, chatbot builder, and someone learning AI from first principles. Writing everything down as I go. Follow along if that sounds useful.

#365DaysOfAI #RAG #RetrievalAugmentedGeneration #AI #Ollama #NodeJS #LLM #LearningInPublic

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Stop Your AI From Lying. Build RAG.

Author(s): Priyanka Mali

Day 9 — No vector database. No API. Just Ollama and a JSON file.

First — Why Do LLMs Fail at Personal Questions?

What is RAG?

How RAG Actually Works — Step by Step

The App I Built

The Code — What Each File Does

How to Run It

What Makes This Different From a Regular Chatbot

One Thing I Found Surprising

What I Learned Today

The Code

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

Full-Stack Data Scientists for the Agentic Coding World

Building Production-Grade AI Skills with Snowflake Cortex AI Function Studio

I Tried 10 AI Agent Frameworks in 2026 — Here’s the Honest Guide I Wish I Had Earlier

How One Spring Boot Optimization Saved Our Startup $30,000 a Year

Inside Palantir AIP: How the World’s Most Controversial AI Platform Actually Works

What Is a Reverse Proxy? (And Why Every Backend Developer Should Care)

What Claude Opus 4.8 Actually Changes If You’re Building Agents

QWEN 3.7 Max Worked For 35 Hrs Straight And The Results Were Mind-blowing

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Stop Your AI From Lying. Build RAG.

Author(s): Priyanka Mali

Day 9 — No vector database. No API. Just Ollama and a JSON file.

First — Why Do LLMs Fail at Personal Questions?

What is RAG?

How RAG Actually Works — Step by Step

The App I Built

The Code — What Each File Does

How to Run It

What Makes This Different From a Regular Chatbot

One Thing I Found Surprising

What I Learned Today

The Code

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement