Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Our 15 AI experts built the most comprehensive, practical, 90+ lesson courses to master AI Engineering - we have pathways for any experience at Towards AI Academy. Cohorts still open - use COHORT10 for 10% off.

Publication

Building an Employee Onboarding Chatbot with RAG, FastAPI, and AI
Artificial Intelligence   Latest   Machine Learning

Building an Employee Onboarding Chatbot with RAG, FastAPI, and AI

Last Updated on September 29, 2025 by Editorial Team

Author(s): Abinaya Subramaniam

Originally published on Towards AI.

Learn how to build a smart employee onboarding assistant using Retrieval-Augmented Generation (RAG), FastAPI, and LLMs. Step-by-step guide with code, vector embeddings, and conversation memory to create a context-aware AI chatbot for your company handbook.

Building an Employee Onboarding Chatbot with RAG, FastAPI, and AI
Image by author

Introduction

Every company faces the same challenge. Employees frequently have questions about policies, leave structures, benefits, and workplace rules. Traditionally, employees are expected to read thick handbooks or dig through lengthy documents to find answers, which can be time consuming and frustrating.

We wanted to make this easier. Instead of expecting our team to navigate dozens of pages, we aimed to provide a solution where employees can simply ask, How many annual leave days do I get? and receive a clear, accurate answer in seconds.

To achieve this, we implemented an AI-powered chatbot that can read the company handbook, retain conversational context, and respond in a friendly, helpful manner. This system connects concepts like language models, embeddings, and vector databases to ensure employees get precise information quickly, improving both productivity and satisfaction.

Inovex Genie — Image by Author

Step 1: Why Large Language Models Alone Are Not Enough

LLMs like ChatGPT are trained on a lot of general text. If you ask them about your company-specific policies, they won’t have the correct answer because that information is private and specific. You could feed the handbook directly into the model, but LLMs have a token limit, meaning they can only process a certain amount of text at once.

So we need a system that can look up the relevant parts of the handbook and feed them to the model. This approach is called Retrieval-Augmented Generation (RAG).

Step 2: What is Retrieval-Augmented Generation (RAG)?

Think of a smart intern who has read hundreds of books but cannot remember everything. You ask them a question, and they first search for the exact paragraph in the relevant book, then summarize it in simple language. That’s RAG.

In our chatbot:

  1. Retriever searches the handbook for relevant passages.
  2. Generator (LLM) reads those passages and answers the user in a friendly manner.

This ensures answers are accurate and grounded in your own company material.

RAG Workflow — Image by Author

Step 3: Preparing the Project

We need a few libraries: FastAPI for the backend, LangChain to connect the LLM and retriever, Groq Llama-3.1 as the language model, and tools for embeddings and vector storage. Install them using:

pip install fastapi uvicorn python-dotenv
pip install langchain langchain-community langchain_groq langchain_huggingface langchain_chroma
pip install pypdf sentence_transformers chromadb

We also create a .env file for storing API keys safely:

GROQ_API_KEY=your_groq_api_key_here

In Python, we load the API key like this:

from dotenv import load_dotenv
import os
load_dotenv()
groq_api_key = os.getenv("GROQ_API_KEY")

Step 4: Reading the Handbook with a Loader

PDFs are structured for humans, not computers. To extract plain text, we use PyPDFLoader from LangChain. This reads each page of the PDF and creates a list of documents.

from langchain.document_loaders import PyPDFLoader
pdf_files = ["employee_handbook.pdf"]
all_docs = []
for file in pdf_files:
loader = PyPDFLoader(file)
docs = loader.load()
all_docs.extend(docs)

At this point, all_docs contains all the text from the handbook, split page by page.

Step 5: Splitting the Text into Chunks

Language models can’t process huge text blocks. Imagine trying to feed a 50-page handbook in one go, it would exceed token limits.

We split text into chunks with slight overlaps so context isn’t lost between chunks:

from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(all_docs)

Each chunk is now a manageable piece of text that can be processed by the LLM.

Step 6: Understanding Embeddings

Computers don’t understand words like humans. If you search for “vacation days,” the exact phrase might not appear, but “annual leave” might. To solve this, we convert each chunk into embeddings, numerical vectors that capture meaning.

from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

Two similar sentences will now have embeddings close together in vector space, which allows semantic search.

Step 7: Storing Embeddings in a Vector Database

So far, we’ve converted each chunk of text into embeddings, which are numerical representations capturing the meaning of the text. But embeddings by themselves aren’t very useful, you need a way to search them efficiently when a user asks a question.

This is where a vector database comes in. Unlike a traditional database that searches for exact words, a vector database searches for vectors that are close to each other in high-dimensional space. In other words, it finds text chunks that are semantically similar to the user’s query, even if the words don’t exactly match.

For example, if a user asks “How many vacation days do I get?” the database can find a chunk containing “annual leave policy,” because the embeddings for these two sentences are close in meaning.

Here’s how we do it with Chroma:

from langchain_chroma import Chroma
vector_store = Chroma.from_documents(splits, embeddings)
retriever = vector_store.as_retriever()

Now, when a user asks a question, we can quickly find the chunks most relevant to the question.

Step 8: Guiding the Language Model with Prompts

A language model like Llama-3.1 is extremely powerful. It can generate text in almost any style, but by itself, it doesn’t know how it should behave for a specific task. If we just feed it a question, it might give long winded, irrelevant, or even incorrect answers.

This is where prompts come in. A prompt is basically an instruction that tells the model how to respond. In our chatbot, we use a system prompt to set the role of the assistant and define rules for how it should answer.

from langchain_core.prompts import ChatPromptTemplate
system_prompt = """
You are an onboarding assistant - Innovex Genie for new employees at Innovex.
Use the following retrieved context to answer the question.
If you do not know the answer, say you don't know.
Be concise, friendly, and helpful.
Context:
{context}
"""
human_prompt = "{input}"qa_prompt = ChatPromptTemplate.from_messages([
("system", system_prompt),
("human", human_prompt)
])

Let’s see what happens here:

  1. System prompt — Think of this as giving the AI its job description. It tells the model that it is an onboarding assistant for Innovex, and it should answer using only the provided context. This prevents it from guessing or hallucinating information that isn’t in the handbook.
  2. Human prompt — This is where the user’s actual question is inserted. For example, if the user asks, “How many annual leave days do I get?” that input goes here.
  3. ChatPromptTemplate — This combines the system instructions and the user question into a single message that the model can process. It ensures the language model understands the context and the role it should play.

By providing these instructions, the chatbot knows how to behave. It will give answers that are concise, relevant, friendly, and grounded in the handbook, instead of generating generic or incorrect responses.This ensures the bot answers based on retrieved context and doesn’t hallucinate.

Step 9: Creating the RAG Chain

At this stage, we have two main components ready: a retriever that can find the most relevant sections of the handbook, and a language model (LLM) that can generate natural, friendly answers. But to make the chatbot actually work, we need to connect these two components in a pipeline. This is what we call a RAG chain.

Think of it like this: the retriever is a librarian who quickly fetches the relevant pages from a large handbook, and the LLM is an assistant who reads those pages and explains the answer in plain language. By combining them, we create a system that can answer questions accurately without the model having to memorize the entire handbook.

In code, this looks like:

from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain_groq import ChatGroq
llm = ChatGroq(
groq_api_key=groq_api_key,
model_name="llama-3.1-8b-instant",
temperature=0.7
)
qa_chain = create_stuff_documents_chain(llm, qa_prompt)
rag_chain = create_retrieval_chain(retriever, qa_chain)

Here’s what happens when a user asks a question:

  1. The retriever searches the vector database and fetches the most relevant chunks of the handbook.
  2. These chunks are combined into a single context by create_stuff_documents_chain.
  3. The language model reads the combined context and generates a precise answer for the user.

By building this RAG chain, we ensure that every answer the chatbot gives is grounded in the actual handbook, making it reliable and helpful for new employees.

Now, the bot can answer queries grounded in the handbook.

Step 10: Adding Conversation Memory

Without memory, each question is treated in isolation. To make conversations flow naturally, we add a history aware retriever. It rewrites follow-up questions into standalone questions using chat history:

from langchain_core.prompts import MessagesPlaceholder
from langchain.chains import create_history_aware_retriever
contextual_q_system_prompt = """
Given a chat history and the latest user question,
formulate a standalone question understandable without history.
Do not answer, just reformulate.
"""
contextual_q_prompt = ChatPromptTemplate.from_messages([
("system", contextual_q_system_prompt),
MessagesPlaceholder("chat_history"),
("human", "{input}")
])
history_aware_retriever = create_history_aware_retriever(llm, retriever, contextual_q_prompt)
rag_chain = create_retrieval_chain(history_aware_retriever, qa_chain)
  • MessagesPlaceholder("chat_history") inserts previous messages:
    This tells the system to automatically include all previous user and AI messages in the prompt sent to the language model. By having access to past conversation turns, the model can understand references, pronouns, or context that a follow-up question might depend on.
  • The LLM rewrites queries before retrieval, keeping the retriever stateless:
    Instead of giving the retriever raw follow-up questions, the LLM first reformulates them into standalone queries. This means the retriever doesn’t need to “remember” anything — it just searches the vector database with a fully-formed, context-aware query. This separation of responsibilities makes the system more reliable and easier to scale.
  • Smooth multi-turn conversations:
    Because the follow-up question is reformulated with context, the chatbot can handle dialogues naturally. For example, after asking about annual leave, a user can follow up with “And how about sick leave?” The chatbot understands the reference and provides an accurate answer, making the conversation feel coherent and human-like.

Step 11: Serving the Chatbot with FastAPI

Finally, we expose the chatbot as an API using FastAPI:

from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from langchain_core.messages import HumanMessage, AIMessage
app = FastAPI()app.add_middleware(
CORSMiddleware,
allow_origins=["http://localhost:3000"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
class ChatRequest(BaseModel):
message: str
chat_histories = {}@app.post("/chat")
async def chat_endpoint(request: ChatRequest):
try:
if "default" not in chat_histories:
chat_histories["default"] = []

chat_history = chat_histories["default"]

response = rag_chain.invoke({
"input": request.message,
"chat_history": chat_history
})

chat_history.append(HumanMessage(content=request.message))
chat_history.append(AIMessage(content=response['answer']))

return {"response": response['answer']}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))

Run the server with:

uvicorn app:app --reload --host 0.0.0.0 --port 8000

You can now test the chatbot by sending a POST request:

curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"message":"What is the leave policy?"}'

Step 12: Conclusion

By combining retrieval, embeddings, and generation, we transformed a static handbook into a conversational AI assistant. The system can answer specific employee questions while remembering context, making onboarding much smoother.

Demo — Image by Author

This architecture can also be adapted for customer support, healthcare FAQs, legal documents, or university guides. The key takeaway is that retrieval and generation together make AI practically useful.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Discover Your Dream AI Career at Towards AI Jobs

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!

Note: Content contains the views of the contributing authors and not Towards AI.