Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Free: 6-day Agentic AI Engineering Email Guide.
Learnings from Towards AI's hands-on work with real clients.
How I Built a Chatbot Without APIs, GPUs, or Money
Latest   Machine Learning

How I Built a Chatbot Without APIs, GPUs, or Money

Last Updated on January 2, 2026 by Editorial Team

Author(s): Asif Khan

Originally published on Towards AI.

How I Built a Chatbot Without APIs, GPUs, or Money
A locally running RAG chatbot built with Ollama, FastAPI, and Chroma, answering questions strictly from uploaded documents with clear source references.

I wanted to create a chatbot, but without spending a single penny.

No paid API keys.
No GPU dependency.
No cloud bills.

The goal was simple:
👉 Understand the logic, concepts, and terms behind a working chatbot, not just make something that “works”.

During my research, I found that Microsoft has a small open-source model called Phi.
It’s lightweight and good enough for learning.

Then I discovered Ollama, which lets you run models locally on your system.
No API keys. No internet dependency. Perfect for experimentation.

In this blog, I’ll walk you through each step I followed to build a working RAG-based chatbot backend, keeping things simple and practical.

I’ll avoid long paragraphs and focus on clarity.

This is Part 1, where we build the backend.

What We Are Building

  • A local chatbot backend
  • Runs completely offline
  • Uses PDFs as knowledge
  • Answers questions only from documents
  • Can say “I don’t know” when the answer is not present

Prerequisites

1. Operating System

  • Windows

2. Python

  • Python 3.10 or above

3. Virtual Environment (PowerShell)

Create and activate a virtual environment:

To activate, type My_Env\Scripts\activate.ps1 in CLI.

You should see (My_Env) in your terminal.

Next, create this folder structure.

ai_chatbot_project/

├── backend/
│ ├── app/
│ │ ├── __init__.py
│ │ ├── main.py
│ │ ├── llm.py
│ │ ├── rag.py
│ │ ├── schemas.py
│ │
│ ├── data/
│ │ └── uploads/
│ │
│ ├── vectorstore/
│ │
│ ├── .env
│ ├── requirements.txt

├── frontend/
│ ├── streamlit_app.py

└── README.md

requirements.txt (dependency management)

This file lists:

  • All Python packages needed to run the backend

Why it’s important:

  • Reproducible setup
  • Easy onboarding for others
  • Clean environment management

Anyone can recreate the backend with a single command.

fastapi
uvicorn

# LangChain core + ecosystem
langchain
langchain-core
langchain-community
langchain-text-splitters
langchain-chroma
langchain-ollama

# Vector database
chromadb

# PDF loading
pypdf

# Environment & utilities
python-dotenv
pydantic
requests

# Frontend
streamlit

“Install everything as follows:
Make sure you are inside the ai_chatbot_project folder.
Use: cd ai_chatbot_projects

pip install -r requirements.txt

Installing Ollama and Phi

Step 1: Download Ollama from:
👉 https://ollama.com

Download and install Ollama for Windows.

After installation:

  • Ollama runs as a background service
  • You do NOT need to open it manually

Verify Ollama Is Installed:

  • Open a new terminal and run
ollama --version

Step 2: Pull Microsoft Phi Model Locally

ollama pull phi

This downloads Microsoft Phi model locally.

Size:

  • ~2–3 GB
  • One-time download

Wait until it completes.

Step 3: Test Phi Locally (Very Important)

Run:

ollama run phi

Ask something simple:

What is FastAPI?

If you get a response → model works locally.

Exit with: Ctrl + D

Step 4: For embeddings (important for RAG):

Run:

ollama pull nomic-embed-text

Now your system is ready to run models locally.

Why We Need RAG

LLMs can answer anything, even when the answer is not in your document.

That’s dangerous.

So we use RAG (Retrieval Augmented Generation):

  • Load documents
  • Convert text to embeddings
  • Store embeddings in a vector database
  • Retrieve relevant chunks
  • Answer only from retrieved content

Backend Files Explained

1) main.py – Application Entry Point

This is the starting point of the backend.

Responsibilities:

  • Initializes the FastAPI application
  • Exposes API endpoints
  • Acts as the bridge between frontend and backend logic

Key things it handles:

  • PDF upload requests
  • Question-answer requests
  • Request validation and response formatting

Think of main.py as the traffic controller.
It doesn’t do heavy processing, it just routes requests to the right place.

from fastapi import FastAPI, HTTPException, UploadFile, File
from app.llm import llm, rag_answer
from app.schemas import QuestionRequest
from app.rag import ingest_pdf
from pathlib import Path
import shutil

app = FastAPI(title="AI Chatbot Backend")

# Base directory → backend/
BASE_DIR = Path(__file__).resolve().parents[1]
UPLOAD_DIR = BASE_DIR / "data" / "uploads"
UPLOAD_DIR.mkdir(parents=True, exist_ok=True)


@app.get("/")
def home():
return {"status": "Backend is running"}


@app.post("/ask")
def ask_question(payload: QuestionRequest):
try:
response = llm.invoke(payload.question)
return {"answer": response}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))


@app.post("/upload-pdf")
def upload_pdf(file: UploadFile = File(...)):
file_path = UPLOAD_DIR / file.filename

with open(file_path, "wb") as buffer:
shutil.copyfileobj(file.file, buffer)

chunks = ingest_pdf(str(file_path))
return {"message": "PDF ingested successfully", "chunks": chunks}


@app.post("/ask-pdf")
def ask_pdf(payload: QuestionRequest):
return rag_answer(
payload.question,
payload.chat_history
)

2) rag.py – Document Ingestion and Retrieval Logic

This file handles everything related to documents.

Responsibilities:

  • Loading PDF files
  • Splitting large text into smaller chunks
  • Creating embeddings from text
  • Storing and retrieving vectors from the vector database

Important design choices here:

  • Uses a dedicated embedding model, not the LLM
  • Uses a persistent vector store, so data survives restarts
  • Uses an explicit collection name to avoid data loss

In simple terms, rag.py is the memory of the chatbot.

from langchain_community.document_loaders import PyPDFLoader
from langchain_chroma import Chroma
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_ollama import OllamaEmbeddings
from pathlib import Path

# Base directory → backend/
BASE_DIR = Path(__file__).resolve().parents[1]

UPLOAD_DIR = BASE_DIR / "data" / "uploads"
VECTOR_DIR = BASE_DIR / "vectorstore"
COLLECTION_NAME = "pdf_documents"

UPLOAD_DIR.mkdir(parents=True, exist_ok=True)
VECTOR_DIR.mkdir(parents=True, exist_ok=True)

embeddings = OllamaEmbeddings(model="nomic-embed-text")


def ingest_pdf(file_path: str):
loader = PyPDFLoader(file_path)
documents = loader.load()

splitter = RecursiveCharacterTextSplitter(
chunk_size=800,
chunk_overlap=150
)
chunks = splitter.split_documents(documents)

vectordb = Chroma(
collection_name=COLLECTION_NAME,
persist_directory=str(VECTOR_DIR),
embedding_function=embeddings
)

vectordb.add_documents(chunks)

return {
"pages": len({doc.metadata.get("page") for doc in documents}),
"chunks": len(chunks)
}


def get_retriever_with_sources():
vectordb = Chroma(
collection_name=COLLECTION_NAME,
persist_directory=str(VECTOR_DIR),
embedding_function=embeddings
)
return vectordb.as_retriever(search_kwargs={"k": 4})

3) llm.py – Intelligence and Guardrails

This is the most critical file in the backend.

Responsibilities:

  • Calling the language model
  • Building prompts with context
  • Enforcing strict rules on when to answer
  • Deciding when to say “I don’t know”

Key ideas implemented here:

  • The chatbot answers only when the document supports it
  • Weak retrieval results are rejected
  • Sources are shown only if an answer exists

This file ensures the chatbot is honest and trustworthy, not just clever.

import os
from dotenv import load_dotenv
from langchain_ollama import OllamaLLM

from langchain_core.runnables import RunnableWithMessageHistory
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from app.rag import get_retriever_with_sources


#for local version
llm = OllamaLLM(model="phi",temperature=0.2)

retriever = get_retriever_with_sources()

# Prompt (from langchain-core)
prompt = PromptTemplate(
input_variables=["chat_history", "question"],
template="""
You are a helpful assistant.

Conversation so far:
{chat_history}

User question:
{question}

Answer clearly and concisely:
"""

)


rag_prompt = PromptTemplate(
input_variables=["context", "question"],
template="""
Answer the question ONLY using the context below.
If the answer is not present, say "I don't know".

Conversation so far:
{chat_history}

Context:
{context}

Question:
{question}

Answer:
"""

)

def rag_answer(question: str, chat_history: str):
docs = retriever.invoke(question)

# 1. If retriever confidence is weak → no answer
if not docs or len(docs) < 2:
return {
"answer": "I don't know. This is not mentioned in the document.",
"sources": []
}

context = "\n\n".join(d.page_content for d in docs)

answer = llm.invoke(
rag_prompt.format(
context=context,
question=question,
chat_history=chat_history
)
)

answer_text = answer.strip() if isinstance(answer, str) else str(answer).strip()
normalized = answer_text.lower()

# 2. If answer is empty or generic → reject
if (
not answer_text
or "i don't know" in normalized
or "not mentioned" in normalized
or "cannot find" in normalized
or "I'm sorry" in normalized
):
return {
"answer": "I don't know. This is not mentioned in the document.",
"sources": []
}

# 3. VALID answer → attach sources
sources = sorted({
f"{doc.metadata.get('source')} (page {doc.metadata.get('page')})"
for doc in docs
if doc.metadata.get("source") is not None
})

return {
"answer": answer_text,
"sources": sources
}



rag_chain = (
{
"context": retriever | (lambda docs: "\n\n".join(d.page_content for d in docs)),
"question": RunnablePassthrough()
}
| rag_prompt
| llm
)

# Base chain (RunnableSequence)
base_chain = prompt | llm

# Simple in-memory message store
_store = {}

def get_session_history(session_id: str):
if session_id not in _store:
_store[session_id] = InMemoryChatMessageHistory()
return _store[session_id]

# Chain with memory
chat_chain = RunnableWithMessageHistory(
base_chain,
get_session_history,
input_messages_key="question",
history_messages_key="chat_history",
)

4) schemas.py – Request and Response Validation

This file defines the shape of data flowing into the backend.

Responsibilities:

  • Validating incoming requests
  • Enforcing required fields
  • Preventing malformed inputs

Why this matters:

  • Avoids silent bugs
  • Makes APIs predictable
  • Helps future scaling and debugging

Think of schemas.py as a contract between frontend and backend.

from pydantic import BaseModel

class QuestionRequest(BaseModel):
question: str
chat_history: str

5) data/uploads/ – Uploaded Files

This folder stores:

  • User-uploaded PDF documents

Why it exists:

  • Keeps raw documents separate from processed data
  • Makes it easier to manage or delete files later

This folder is temporary storage, not intelligence.

6) vectorstore/ – Embedded Knowledge

This folder contains:

  • Vector database files created by ChromaDB

Why this matters:

  • This is where meaning is stored, not text
  • Enables semantic search
  • Persists document understanding across restarts

Deleting this folder resets the chatbot’s knowledge.

7) .env – Environment Configuration

Used to store:

  • Configuration values
  • Environment-specific settings

Even though we avoided paid APIs initially, keeping this file prepares the project for:

  • Future API keys
  • Deployment environments
  • Secure configuration handling

At this point, we have a fully working backend that runs locally and understands documents.
In Part 2, we’ll first validate this backend by testing our model using Swagger UI, and then move on to building a clean, chat-based frontend (using streamlit).

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

6-Day Agentic AI Engineering Email Guide — one practical lesson per day

Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.