Building an Employee Onboarding Chatbot with RAG, FastAPI, and AI

Last Updated on September 29, 2025 by Editorial Team

Author(s): Abinaya Subramaniam

Originally published on Towards AI.

Learn how to build a smart employee onboarding assistant using Retrieval-Augmented Generation (RAG), FastAPI, and LLMs. Step-by-step guide with code, vector embeddings, and conversation memory to create a context-aware AI chatbot for your company handbook.

Building an Employee Onboarding Chatbot with RAG, FastAPI, and AI — Image by author

Introduction

Every company faces the same challenge. Employees frequently have questions about policies, leave structures, benefits, and workplace rules. Traditionally, employees are expected to read thick handbooks or dig through lengthy documents to find answers, which can be time consuming and frustrating.

We wanted to make this easier. Instead of expecting our team to navigate dozens of pages, we aimed to provide a solution where employees can simply ask, How many annual leave days do I get? and receive a clear, accurate answer in seconds.

To achieve this, we implemented an AI-powered chatbot that can read the company handbook, retain conversational context, and respond in a friendly, helpful manner. This system connects concepts like language models, embeddings, and vector databases to ensure employees get precise information quickly, improving both productivity and satisfaction.

Step 1: Why Large Language Models Alone Are Not Enough

LLMs like ChatGPT are trained on a lot of general text. If you ask them about your company-specific policies, they won’t have the correct answer because that information is private and specific. You could feed the handbook directly into the model, but LLMs have a token limit, meaning they can only process a certain amount of text at once.

So we need a system that can look up the relevant parts of the handbook and feed them to the model. This approach is called Retrieval-Augmented Generation (RAG).

Step 2: What is Retrieval-Augmented Generation (RAG)?

Think of a smart intern who has read hundreds of books but cannot remember everything. You ask them a question, and they first search for the exact paragraph in the relevant book, then summarize it in simple language. That’s RAG.

In our chatbot:

Retriever searches the handbook for relevant passages.
Generator (LLM) reads those passages and answers the user in a friendly manner.

This ensures answers are accurate and grounded in your own company material.

Step 3: Preparing the Project

We need a few libraries: FastAPI for the backend, LangChain to connect the LLM and retriever, Groq Llama-3.1 as the language model, and tools for embeddings and vector storage. Install them using:

pip install fastapi uvicorn python-dotenv
pip install langchain langchain-community langchain_groq langchain_huggingface langchain_chroma
pip install pypdf sentence_transformers chromadb

We also create a .env file for storing API keys safely:

GROQ_API_KEY=your_groq_api_key_here

In Python, we load the API key like this:

from dotenv import load_dotenv
import os

load_dotenv()
groq_api_key = os.getenv("GROQ_API_KEY")

Step 4: Reading the Handbook with a Loader

PDFs are structured for humans, not computers. To extract plain text, we use PyPDFLoader from LangChain. This reads each page of the PDF and creates a list of documents.

from langchain.document_loaders import PyPDFLoader

pdf_files = ["employee_handbook.pdf"]
all_docs = []for file in pdf_files:
 loader = PyPDFLoader(file)
 docs = loader.load()
 all_docs.extend(docs)

At this point, all_docs contains all the text from the handbook, split page by page.

Step 5: Splitting the Text into Chunks

Language models can’t process huge text blocks. Imagine trying to feed a 50-page handbook in one go, it would exceed token limits.

We split text into chunks with slight overlaps so context isn’t lost between chunks:

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(all_docs)

Each chunk is now a manageable piece of text that can be processed by the LLM.

Step 6: Understanding Embeddings

Computers don’t understand words like humans. If you search for “vacation days,” the exact phrase might not appear, but “annual leave” might. To solve this, we convert each chunk into embeddings, numerical vectors that capture meaning.

from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

Two similar sentences will now have embeddings close together in vector space, which allows semantic search.

Step 7: Storing Embeddings in a Vector Database

So far, we’ve converted each chunk of text into embeddings, which are numerical representations capturing the meaning of the text. But embeddings by themselves aren’t very useful, you need a way to search them efficiently when a user asks a question.

This is where a vector database comes in. Unlike a traditional database that searches for exact words, a vector database searches for vectors that are close to each other in high-dimensional space. In other words, it finds text chunks that are semantically similar to the user’s query, even if the words don’t exactly match.

For example, if a user asks “How many vacation days do I get?” the database can find a chunk containing “annual leave policy,” because the embeddings for these two sentences are close in meaning.

Here’s how we do it with Chroma:

from langchain_chroma import Chroma

vector_store = Chroma.from_documents(splits, embeddings)
retriever = vector_store.as_retriever()

Now, when a user asks a question, we can quickly find the chunks most relevant to the question.

Step 8: Guiding the Language Model with Prompts

A language model like Llama-3.1 is extremely powerful. It can generate text in almost any style, but by itself, it doesn’t know how it should behave for a specific task. If we just feed it a question, it might give long winded, irrelevant, or even incorrect answers.

This is where prompts come in. A prompt is basically an instruction that tells the model how to respond. In our chatbot, we use a system prompt to set the role of the assistant and define rules for how it should answer.

from langchain_core.prompts import ChatPromptTemplate

system_prompt = """
You are an onboarding assistant - Innovex Genie for new employees at Innovex.
Use the following retrieved context to answer the question.
If you do not know the answer, say you don't know.
Be concise, friendly, and helpful.Context:
{context}
"""human_prompt = "{input}"qa_prompt = ChatPromptTemplate.from_messages([
 ("system", system_prompt),
 ("human", human_prompt)
])

Let’s see what happens here:

System prompt — Think of this as giving the AI its job description. It tells the model that it is an onboarding assistant for Innovex, and it should answer using only the provided context. This prevents it from guessing or hallucinating information that isn’t in the handbook.
Human prompt — This is where the user’s actual question is inserted. For example, if the user asks, “How many annual leave days do I get?” that input goes here.
ChatPromptTemplate — This combines the system instructions and the user question into a single message that the model can process. It ensures the language model understands the context and the role it should play.

By providing these instructions, the chatbot knows how to behave. It will give answers that are concise, relevant, friendly, and grounded in the handbook, instead of generating generic or incorrect responses.This ensures the bot answers based on retrieved context and doesn’t hallucinate.

Step 9: Creating the RAG Chain

At this stage, we have two main components ready: a retriever that can find the most relevant sections of the handbook, and a language model (LLM) that can generate natural, friendly answers. But to make the chatbot actually work, we need to connect these two components in a pipeline. This is what we call a RAG chain.

Think of it like this: the retriever is a librarian who quickly fetches the relevant pages from a large handbook, and the LLM is an assistant who reads those pages and explains the answer in plain language. By combining them, we create a system that can answer questions accurately without the model having to memorize the entire handbook.

In code, this looks like:

from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain_groq import ChatGroq

llm = ChatGroq(
 groq_api_key=groq_api_key,
 model_name="llama-3.1-8b-instant",
 temperature=0.7
)qa_chain = create_stuff_documents_chain(llm, qa_prompt)
rag_chain = create_retrieval_chain(retriever, qa_chain)

Here’s what happens when a user asks a question:

The retriever searches the vector database and fetches the most relevant chunks of the handbook.
These chunks are combined into a single context by create_stuff_documents_chain.
The language model reads the combined context and generates a precise answer for the user.

By building this RAG chain, we ensure that every answer the chatbot gives is grounded in the actual handbook, making it reliable and helpful for new employees.

Now, the bot can answer queries grounded in the handbook.

Step 10: Adding Conversation Memory

Without memory, each question is treated in isolation. To make conversations flow naturally, we add a history aware retriever. It rewrites follow-up questions into standalone questions using chat history:

from langchain_core.prompts import MessagesPlaceholder
from langchain.chains import create_history_aware_retriever

contextual_q_system_prompt = """
Given a chat history and the latest user question, 
formulate a standalone question understandable without history.
Do not answer, just reformulate.
"""contextual_q_prompt = ChatPromptTemplate.from_messages([
 ("system", contextual_q_system_prompt),
 MessagesPlaceholder("chat_history"),
 ("human", "{input}")
])history_aware_retriever = create_history_aware_retriever(llm, retriever, contextual_q_prompt)
rag_chain = create_retrieval_chain(history_aware_retriever, qa_chain)

MessagesPlaceholder("chat_history") inserts previous messages:
This tells the system to automatically include all previous user and AI messages in the prompt sent to the language model. By having access to past conversation turns, the model can understand references, pronouns, or context that a follow-up question might depend on.
The LLM rewrites queries before retrieval, keeping the retriever stateless:
Instead of giving the retriever raw follow-up questions, the LLM first reformulates them into standalone queries. This means the retriever doesn’t need to “remember” anything — it just searches the vector database with a fully-formed, context-aware query. This separation of responsibilities makes the system more reliable and easier to scale.
Smooth multi-turn conversations:
Because the follow-up question is reformulated with context, the chatbot can handle dialogues naturally. For example, after asking about annual leave, a user can follow up with “And how about sick leave?” The chatbot understands the reference and provides an accurate answer, making the conversation feel coherent and human-like.

Step 11: Serving the Chatbot with FastAPI

Finally, we expose the chatbot as an API using FastAPI:

from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from langchain_core.messages import HumanMessage, AIMessage

app = FastAPI()app.add_middleware(
 CORSMiddleware,
 allow_origins=["http://localhost:3000"],
 allow_credentials=True,
 allow_methods=["*"],
 allow_headers=["*"],
)class ChatRequest(BaseModel):
 message: strchat_histories = {}@app.post("/chat")
async def chat_endpoint(request: ChatRequest):
 try:
 if "default" not in chat_histories:
 chat_histories["default"] = []
 
 chat_history = chat_histories["default"]
 
 response = rag_chain.invoke({
 "input": request.message, 
 "chat_history": chat_history
 })
 
 chat_history.append(HumanMessage(content=request.message))
 chat_history.append(AIMessage(content=response['answer']))
 
 return {"response": response['answer']}
 except Exception as e:
 raise HTTPException(status_code=500, detail=str(e))

Run the server with:

uvicorn app:app --reload --host 0.0.0.0 --port 8000

You can now test the chatbot by sending a POST request:

curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"message":"What is the leave policy?"}'

Step 12: Conclusion

By combining retrieval, embeddings, and generation, we transformed a static handbook into a conversational AI assistant. The system can answer specific employee questions while remembering context, making onboarding much smoother.

This architecture can also be adapted for customer support, healthcare FAQs, legal documents, or university guides. The key takeaway is that retrieval and generation together make AI practically useful.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Building an Employee Onboarding Chatbot with RAG, FastAPI, and AI

Author(s): Abinaya Subramaniam

Introduction

Step 1: Why Large Language Models Alone Are Not Enough

Step 2: What is Retrieval-Augmented Generation (RAG)?

Step 3: Preparing the Project

Step 4: Reading the Handbook with a Loader

Step 5: Splitting the Text into Chunks

Step 6: Understanding Embeddings

Step 7: Storing Embeddings in a Vector Database

Step 8: Guiding the Language Model with Prompts

Step 9: Creating the RAG Chain

Step 10: Adding Conversation Memory

Step 11: Serving the Chatbot with FastAPI

Step 12: Conclusion

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

Full-Stack Data Scientists for the Agentic Coding World

Building Production-Grade AI Skills with Snowflake Cortex AI Function Studio

I Tried 10 AI Agent Frameworks in 2026 — Here’s the Honest Guide I Wish I Had Earlier

How One Spring Boot Optimization Saved Our Startup $30,000 a Year

Inside Palantir AIP: How the World’s Most Controversial AI Platform Actually Works

What Is a Reverse Proxy? (And Why Every Backend Developer Should Care)

What Claude Opus 4.8 Actually Changes If You’re Building Agents

QWEN 3.7 Max Worked For 35 Hrs Straight And The Results Were Mind-blowing

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Building an Employee Onboarding Chatbot with RAG, FastAPI, and AI

Author(s): Abinaya Subramaniam

Introduction

Step 1: Why Large Language Models Alone Are Not Enough

Step 2: What is Retrieval-Augmented Generation (RAG)?

Step 3: Preparing the Project

Step 4: Reading the Handbook with a Loader

Step 5: Splitting the Text into Chunks

Step 6: Understanding Embeddings

Step 7: Storing Embeddings in a Vector Database

Step 8: Guiding the Language Model with Prompts

Step 9: Creating the RAG Chain

Step 10: Adding Conversation Memory

Step 11: Serving the Chatbot with FastAPI

Step 12: Conclusion

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement