Beyond pre-trained LLMs: Augmenting LLMs through vector databases to create a chatbot on organizational data
Last Updated on September 25, 2025 by Editorial Team
Author(s): Leapfrog Technology
Originally published on Towards AI.
In the ever-evolving realm of AI-driven applications, the power of Large Language Models (LLMs) like OpenAI’s GPT and Meta’s Llama2 cannot be overstated. In our previous article, we introduced you to the fascinating world of Language Models (LLMs) and the innovative LangChain framework. We demonstrated their utility in a straightforward but impactful use case, showcasing how OpenAI’s GPT LLM can be employed to extract structured information.
Now, as we dive further into the realm of LLMs, we begin to unravel an essential truth — while the potential of these models is immense, they might not always align perfectly with your specific needs straight out of the box. Why, you ask?
The reasons behind this limitation are manifold. First and foremost, tailored outputs may be the need of the hour, with your application requiring a distinct structure or style that the general LLM cannot grasp intuitively. Additionally, the pre-trained LLM might lack the essential context, crucial documents, or industry-specific knowledge that is indispensable for your project’s success. Think of it as a chatbot attempting to answer questions about intricate organizational protocols when it has never encountered the documents outlining those protocols in its training data.
Moreover, specialized vocabulary can prove to be a stumbling block, especially in domains replete with unique terminologies, concepts, and structures. Financial data, medical research papers, or transcripts of company meetings may contain terms and nuances unfamiliar to the LLM’s generic training data, making it stumble when attempting to summarize, respond, or generate content within these domains.
In the above example, the LLM lacks domain-specific information about the Volvo XC60. Although the LLM has no idea how to turn off reverse braking for that car model, it performs its generative task to the best of its ability anyway, producing an answer that sounds grammatically solid — but is unfortunately flatly incorrect. The reason LLMs like ChatGPT feel so bright is that they’ve seen an immense amount of human creative output — entire companies’ worth of open source code, libraries worth of books, lifetimes of conversations, scientific datasets, etc., but, critically, this core training data is static or incomplete to the required context.
So, how do you bridge this gap and ensure that an LLM aligns seamlessly with your distinct requirements? The answer lies in the realm of customization. You’ll likely need to fine-tune or adapt it to your specific use case. Currently, there exist four prominent methods for this:
- Full Fine-tuning: Comprehensive adjustment of all LLM parameters using task-specific data.
- Parameter-efficient Fine-tuning (PEFT): Strategic modification of select parameters to enhance efficiency in adaptation.
- Prompt Engineering: Precision refinement of model inputs to influence its output.
- Retrieval Augmented Generation (RAG): A potent fusion of prompt engineering and database querying, crafting contextually rich responses that extend beyond the capabilities of standalone LLMs.
While each method deserves its spotlight, our focus in this article will be on the RAG approach. We’ll begin with an introduction to the fine-tuning vs RAG approach, then delve deep into the world of vector databases, understanding their pivotal role in enhancing LLM capabilities through RAG. We’ll also showcase how this knowledge can be harnessed to create a fundamental chatbot tailored for organizational data. Along the way, we’ll illuminate various key concepts and techniques employed in this process. Join us on this insightful journey as we explore the significance of vector databases and their impact in the exciting realm of AI-driven applications.
Optimizing LLMs by Fine-tuning
Fine-tuning is a sophisticated technique that has gained prominence in the world of machine learning, offering a powerful means to enhance the performance of Large Language Models (LLMs). Fine-tuning takes the LLMs a step further by customizing them for specific tasks or domains. At its core, fine-tuning involves additional training of an already pre-trained LLM using a smaller, domain-specific, labeled dataset. This process fine-tunes select model parameters, optimizing its performance for a particular task or set of tasks. Full fine-tuning entails updating all the model parameters, akin to pretraining, albeit on a smaller scale.
Fine-tuning is a comprehensive subject that merits its own dedicated blog post. Nevertheless, in this blog, we will briefly touch upon its applications.
Fine-tuning perpetuates the training process on domain-specific data to refine model capabilities. It finds application across diverse domains:
- Customer service chatbots, fine-tuned on customer feedback and conversation transcripts, gain an improved understanding of sentiment and issue resolution.
- Recommendation systems achieve excellence by fine-tuning with users’ purchase histories, enabling more accurate product recommendations.
- Marketing models, when fine-tuned on voice and tone, generate content that resonates with target audiences.
- Educational models. fine-tuned on curriculum and student assessments, become adept at personalizing lessons and assessing proficiency.
In essence, fine-tuning empowers developers to tailor LLMs to their precise requirements, ingraining their business identity into the model’s framework, resulting in output finely attuned to their niche.
However, it’s essential to bear in mind that even fine-tuned models can confront challenges. They may become susceptible to data shifts over time, necessitating recurrent retraining and monitoring. Additionally, access to high-quality, domain-specific training data remains a prerequisite.
In the next section, we will delve into an approach known as Retrieval Augmented Generation (RAG), which tackles some of these challenges by amalgamating retrieval techniques with LLMs to enhance their capabilities.
Expanding the Context Window: Limitations and the Emergence of Retrieval Augmented Generation (RAG)
In the previous sections, we’ve discussed that the LLMs may lack domain-specific knowledge, access to organization-specific data, and live, up-to-date information. To overcome these limitations, the concept of expanding the context window has gained traction in recent months.
Expanding the context window involves providing more contextual information to LLMs, theoretically allowing them to make more informed responses. Anthropic, for instance, introduced the Claude model with an impressive 100K token context window. OpenAI followed suit, unveiling a 32K token GPT-4 model and a 16K token GPT-3.5 model.
While the idea of an extensive context window may seem like a panacea, it’s important to acknowledge that the approach of “context stuffing” has its drawbacks:
Decreased Answer Quality and Increased Hallucination Risk
As context windows grow, the quality of responses generated by LLMs tends to decrease, and the risk of hallucinations, where the model generates incorrect or fabricated information, increases. Research has shown that LLMs struggle to extract relevant information from excessively large contexts.
Linear Increase in Costs
Handling larger contexts requires more computational resources, and since LLM providers charge per token, longer contexts result in higher costs for each query.
Insufficiency for Organizational Data
Even with a very extensive context window, it might not be enough to provide all the necessary organizational data to an LLM without proper identification of the relevant information.
This is where Retrieval Augmented Generation (RAG) comes into play. RAG offers a solution by seamlessly integrating retrieval systems with LLMs to provide the necessary context and data, mitigating the limitations mentioned above.
The Role of Retrieval Augmented Generation (RAG)
Retrieval systems have been developed and optimized over decades to efficiently extract relevant information on a large scale while reducing costs. The parameters of these systems are adjustable, offering more flexibility compared to LLMs. RAG is the approach that leverages retrieval systems to enhance LLMs’ performance and contextual understanding.
Research indicates that LLMs tend to yield the best results when provided with fewer, highly relevant documents in the context, rather than inundating them with large volumes of unfiltered data. In a recent Stanford paper titled “Lost in the Middle”, researchers demonstrated that even state-of-the-art LLMs struggle to extract valuable information from lengthy and incoherent contexts, especially when critical information is buried within the middle portion of the context.
How Retrieval Augmented Generation (RAG) Works
RAG effectively addresses the limitations of LLMs by providing up-to-date information, domain-specific data, and organizational knowledge. Here’s how RAG works:
Retrieval Component: RAG includes a retrieval mechanism that fetches context-specific data from external databases or documents. This data is relevant to the query being processed.
Generation Component: The retrieved information is combined with the original query, creating an enriched context for the LLM to generate a more accurate response.
The result? RAG allows LLMs to cite their sources, improve auditability, and significantly enhance the accuracy and relevance of their responses.
Advantages of Retrieval Augmented Generation (RAG)
RAG offers several compelling advantages:
- Minimized Hallucinations: RAG reduces the risk of LLMs generating incorrect or fabricated information.
- Adaptability: It can accommodate dynamic, real-time data, making it ideal for applications requiring up-to-date information.
- Interpretability: RAG enables tracing the source of information used in LLM-generated responses.
- Cost-Effectiveness: Compared to fine-tuning, RAG requires less labeled data and computing resources.
Potential Limitations of RAG
While RAG is a powerful tool, it may not be suitable for all scenarios. In cases where a pre-trained LLM struggles with complex tasks like summarizing financial data or interpreting detailed medical records, fine-tuning the model might be a more effective approach.
In summary, Retrieval Augmented Generation (RAG) is a game-changing technique that combines the strengths of LLMs and retrieval systems to provide richer context and up-to-date information, significantly improving the performance and relevance of LLMs in various applications. It overcomes the limitations of the context window and opens up new possibilities for context-aware, accurate, and informed text generation.
Vector Databases and Semantic Search
In our previous section, we delved into the fascinating world of Retrieval Augmented Generation (RAG) and explored how it augments LLMs with new data. But here’s a question that naturally arises: how do we ensure that we can retrieve precisely the context we need from this vast sea of unstructured information?
Let’s illustrate this with a real-life scenario: Imagine Sarah, a commuter on a train, spots someone wearing an exquisite pair of handcrafted wooden sunglasses adorned with intricate carvings. She’s captivated by the unique design but misses the chance to inquire about them it before the person departs at the next station. Determined to find these one-of-a-kind sunglasses, Sarah turns to the internet later that day. There’s a catch, though — she doesn’t know the brand or any specific keywords related to those sunglasses. Undeterred, she opens her laptop and enters a search query: “handcrafted wooden sunglasses with intricate carvings.” To her delight, the perfect pair pops up as the second option in the search results. Without hesitation, she places an order, complete with a stylish wooden phone case to match her new eyewear.
This real-life scenario beautifully illustrates the power of semantic search, which enables businesses to guide customers towards toward taking action, whether it’s making a purchase or finding the information they seek. Achieving such precision and relevance in search results would have been challenging with traditional keyword searches. Enter the unsung hero of this story: vector databases, the driving force behind the success of semantic search.
In the realm of Artificial Intelligence (AI), where we are dealing with vast and complex data, the need for efficient handling and processing becomes paramount. As AI evolves into more advanced applications like image recognition, voice search, and recommendation engines, the nature of data becomes increasingly intricate. This is precisely where vector databases step onto the stage. Unlike traditional databases that store scalar values, vector databases are custom-designed to handle multi-dimensional data points, often referred to as vectors. Imagine these vectors as arrows pointing in specific directions with varying magnitudes in space. In today’s digital age, where AI and machine learning reign supreme, vector databases have emerged as indispensable tools for storing, searching, and analyzing high-dimensional data vectors.
So, what exactly is a vector database? It’s a specialized database that stores information in the form of multi-dimensional vectors, each representing specific characteristics or qualities. The number of dimensions in each vector can vary widely, from just a few to several thousand, depending on the complexity and detail of the data. Various processes, such as machine learning models, word embeddings, or feature extraction techniques, transform data like text, images, audio, and video into these vectors.
The primary advantage of a vector database lies in its ability to swiftly and accurately locate and retrieve data based on vector proximity or similarity. This means you can conduct searches rooted in semantic or contextual relevance, rather than relying solely on exact matches or predetermined criteria, as is the case with conventional databases.
For instance, with a vector database, you can:
- Search for songs that resonate with a specific tune based on melody and rhythm.
- Discover articles that align with another particular article in theme and perspective.
- Identify gadgets that share the characteristics and reviews of a specific device.
Vector databases come equipped with efficient storage, indexing, and querying mechanisms, all meticulously optimized for vector data. In stark contrast, traditional relational databases, designed primarily for tabular data with fixed columns, struggle to efficiently handle vector data due to its high dimensionality and variable-length characteristics.
In the rapidly evolving landscape of AI and data management, vector databases are the unsung champions, enabling us to navigate the complex and intricate world of high-dimensional data with ease and precision. Whether you’re in search of the perfect pair of sunglasses or embarking on a more profound AI-powered journey, vector databases are here to guide you through the maze of data and lead you to your desired destination.
How Vector Databases Work
To grasp how vector databases operate and why they differ from conventional databases, it’s essential to first comprehend the concept of embeddings.
Embeddings: Transforming Data into Meaningful Vectors
Unstructured data, encompassing text, images, and audio, lacks a predefined format, making it challenging for traditional databases to manage. To harness this data effectively for artificial intelligence and machine learning applications, it undergoes a transformation into numerical representations known as embedding.
Imagine embeddings as unique codes assigned to each item, whether it’s a word, image, or any other data point, capturing its meaning or essence. This process facilitates computer comprehension and comparison of these items in a more efficient and meaningful manner. It’s akin to condensing a complex book into a concise summary while preserving its key points.
Typically, embeddings are generated using specialized neural networks designed for this specific task. For instance, word embeddings convert words into vectors in such a way that words with similar meanings are closer together in the vector space. This transformation empowers algorithms to perceive relationships and similarities between items.
In essence, embeddings act as a bridge, transforming non-numeric data into a format compatible with machine learning models. This enables these models to discern patterns and relationships within the data more effectively.
Building a Vector Database
To create a vector database, the first step is to convert your data into vectors using an embedding model. Each vector represents the meaning of the input data, making it computationally feasible to search for semantically similar items based on their numerical representations.
To enhance the functionality of your vector database, consider incorporating metadata alongside the vectors. This step can significantly enrich the search capabilities and utility of your database. Depending on the specific requirements and capabilities of your chosen vector database solution, you can add various types of metadata to each vector.
One common form of metadata is the source document or page of the vector. This information allows you to trace back the origin of a particular vector, which can be valuable in scenarios where you want to understand the context or provenance of a retrieved item.
Furthermore, you can include custom metadata such as tags and keywords associated with each vector. These additional descriptors provide you with a powerful means to categorize and filter vectors beyond just semantic similarity. Users can perform keyword searches to quickly locate vectors that share specific characteristics or attributes, making it easier to find relevant information within your vector database.
Once you have your vectors and associated metadata, they are inserted into the vector database. This database is engineered to perform high-speed searches for similar matches. Various vector database solutions are available now, each with its unique capabilities.
Upon concluding this procedure for the Volvo user manual, specifically, to rectify the hallucination issue associated with automatic reverse braking on the Volvo XC60, as previously outlined, we will have established a comprehensive Vector database replete with vector embeddings and corresponding metadata for the Volvo user manual.
One of the notable advantages of vector databases is their ability to support real-time updates. This solves the challenge of maintaining data recency for machine learning models in applications like chatbots. For instance, you can automatically create vectors for new product offerings and update the database whenever you launch a new product, ensuring that your chatbot always provides up-to-date information to customers.
Semantic Search and Retrieval Augmented Generation (RAG)
Vector databases excel in semantic search use cases, allowing users to query data using natural language. Semantic search involves translating a user’s natural language query into embeddings and utilizing the vector database to search for similar entries.
You send these embeddings to the vector database, which then conducts a “nearest neighbor” search to identify the vectors that best match the user’s intended query. This semantic search process is at the heart of Retrieval Augmented Generation (RAG). Once the vector database retrieves the pertinent results, your application supplies them to the Language Model (LLM) through its contextual window, triggering the LLM to carry out its generative function. By utilizing the most pertinent facts from the vector database, RAG reduces the likelihood of generating inaccurate or hallucinated responses.
Semantic Search and Retrieval Augmented Generation (RAG) heavily relies on the concept of similarity measures within vector databases. These mathematical methods play a crucial role in determining the resemblance between two vectors in a vector space, enabling efficient query processing. Among the prominent similarity measures employed, the cosine similarity, Euclidean distance, and dot product are commonly used.
- Cosine similaritycalculates the cosine of the angle between two vectors, yielding values between -1 and 1. A score of 1 signifies identical vectors, while 0 denotes orthogonality, and -1 implies vectors in diametric opposition.
- Euclidean distance measures the straight-line separation between vectors, with 0 indicating identical vectors and larger values indicating increased dissimilarity.
- Dot product quantifies the product of vector magnitudes and cosine of the angle between them, resulting in values ranging from negative infinity to positive infinity. Positive values indicate vectors pointing in the same direction, 0 represents orthogonality, and negative values signify opposing directions.
Choosing the appropriate similarity measure is pivotal, as it significantly impacts the outcomes retrieved from a vector database. Each measure comes with its own set of advantages and limitations, making it essential to select the most suitable one according to the specific use case and requirements, ensuring the precision and relevance of the search results.
Advanced Functionality of Vector Databases
While semantic search is a powerful feature of vector databases, they can offer even more advanced functionality. For instance, some vector databases, like Pinecone, support hybrid search functionality. This approach combines semantic and keyword-based retrieval systems to provide a more nuanced and accurate search experience.
Implementing a Chatbot on Organizational Data using Vector Database and RAG
In the previous sections, we delved into the fundamentals of vector databases and Retrieval Augmented Generation (RAG) to enhance the capabilities of LLMs. Now, let’s explore a practical scenario where we can apply this knowledge to implement a chatbot tailored for an organizational use case.
Scenario
A college has created an extensive Employee Handbook to document its policies and procedures, which is intended to be a valuable resource for its employees. However, the sheer size and complexity of the handbook often pose a challenge for employees trying to find specific answers to their questions. Consequently, employees frequently resort to reaching out to the Human Resources (HR) department for even the most basic inquiries. This not only strains the HR team’s resources but also diverts their attention away from more critical tasks. To address this issue, the HR department is exploring the use of Language Models like ChatGPT to assist employees in navigating the Employee Handbook effectively.
Problem
While Language Models like ChatGPT excel at providing information based on their training data, they lack specific knowledge about the content of the college’s Employee Handbook. Consequently, when employees ask questions related to the handbook, the LLMs may provide generic or even random responses, causing confusion and inefficiency.
In the following two examples, we illustrate instances where specific information from the Employee Handbook is required. Ideally, we anticipate answers sourced directly from the Employee Handbook. However, ChatGPT often generates generic responses that, while generally accurate, lack the specificity found within the contents of the Employee Handbook.
Example 1: Queries related to overpayment to employees
Question: What if I get overpaid?
Related Content from Employee Handbook (Expected Answer):
Answer by ChatGPT:
Example 2: Queries related to not returning items upon termination
Question: What if I do not return items on termination?
Related Content from Employee Handbook (Expected Answer):
Answer by ChatGPT:
Solution
As previously discussed in this article, the solution lies in implementing Retrieval Augmented Generation using Vector Databases. In this context, the Employee Handbook can be stored within a vector database. When an employee interacts with the chatbot and asks a question, the chatbot will initiate a semantic search operation within the vector database using techniques like similarity search. This search will return relevant sections or excerpts from the Employee Handbook. These retrieved sections then serve as the context for the Language Model to generate precise and contextually appropriate responses to the employee’s queries.For our implementation, we will utilize Chroma as the vector database. Chroma DB is an open-source vector store designed specifically for storing and retrieving vector embeddings. Some of its key features include:
- Support for various underlying storage options, including DuckDB for standalone usage and ClickHouse for scalability in larger deployments.
- Availability of Software Development Kits (SDKs) for Python and JavaScript/TypeScript, facilitating ease of integration.
- A focus on simplicity, speed, and enabling advanced analysis of vector data.
As of September 2023, Chroma DB provides the option for self-hosted servers. However, it’s worth noting that their roadmap includes plans to offer managed/hosted services in the future.
Furthermore, we will leverage the LangChain framework, which we introduced in a previous blog post, to interact seamlessly with OpenAI LLMs.
Below, you will find actual code snippets that illustrate how to effectively implement this solution using these tools and other techniques. Additional information is provided wherever relevant. The solution comprises two key components: building a vector database containing pertinent documents and executing a semantic search on this database.
Building a Vector Database
Importing crucial modules from OpenAI and the LangChain framework for including OpenAI embeddings, text splitting, and PDF document loading, while also managing environment variables.
import openai
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader
import os
from dotenv import load_dotenv, find_dotenv
Setting up OpenAI API credentials and defining parameters for generating and persisting vector databases.
os.environ['OPENAI_API_KEY'] = '<OPENAI_API_KEY>'
openai.api_key = os.environ['OPENAI_API_KEY']parent_dir = ""
persist_directory = '/docs/chroma/'
file_to_load = '/docs/pdf/Employee Handbook of a College.pdf'
Defining a function to load, split a PDF file, and create a vector database from documents in the PDF file.
def create_and_persist_vector_db(file_path, persist_directory, chunk_size=1000, chunk_overlap=150):
try:
# Load Documents
loader = PyPDFLoader(file_path)
documents = loader.load()
# Split Documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
docs = text_splitter.split_documents(documents)
# Define Embedding Model
embedding = OpenAIEmbeddings()
# Create Vector Database from Data
vectordb = Chroma.from_documents(
documents=docs,
embedding=embedding,
persist_directory=persist_directory
) vectordb.persist()
return vectordb
except Exception as e:
print(f"An error occurred: {str(e)}")
return None
handbook_vectordb = create_and_persist_vector_db(parent_dir + file_to_load, persist_directory)if handbook_vectordb:
print("Vector database created and persisted successfully.")
Here, we utilize the RecursiveCharacterTextSplitter to effectively break down documents into manageable chunks. These chunks are then seamlessly inserted into a vector database as individual documents. While LangChain is designed to accommodate various Text Splitters, we have provided a list of notable ones for your reference:
- CharacterTextSplitter: This is the simplest method, splitting based on characters (defaulting to “\n\n”) and measuring chunk length by the number of characters.
- RecursiveCharacterTextSplitter: This text splitter is highly recommended for generic text. It can be customized with a list of characters and attempts to split text in that order until the chunks become sufficiently small. The default list includes [“\n\n”, “\n”, “ “, “”]. The primary aim is to maintain the continuity of paragraphs, sentences, and words, as these are often the most semantically related pieces of text.
- Split by Token (e.g., tiktoken): Language models have a token limit, which should not be exceeded. To ensure compliance, it’s advisable to count the number of tokens when splitting text into chunks. Multiple tokenizers are available, and it’s essential to use the same tokenizer as employed by the language model.
Furthermore, for document embedding, we have harnessed the power of OpenAIEmbeddings to facilitate storage in the vector database. OpenAI offers a range of embedding models, including ‘text-embedding-ada-002’. In addition to this, LangChain seamlessly integrates with various embedding models such as Cohere, Hugging Face, Llamma, and many more, providing you with a diverse array of choices to suit your specific needs.
Moreover, when constructing vectors from the embeddings, we also retain/store metadata, such as the document’s source, which in this context refers to the page number of the splitted text from the PDF. We’ll witness this in action when querying the database later. In addition to the default metadata, we have the flexibility to incorporate additional metadata, such as the document ID and other relevant tags associated with the document. These supplementary metadata elements are useful for various operations within the vector database.
Semantic Search on Vector DB
Importing modules related to language-based retrieval, memory management, and chat models from the LangChain library.
from langchain.chains import RetrievalQA, ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain.chat_models import ChatOpenAI
Defining a function for creating a conversational retrieval chatbot chain using a specified language model, retriever, and other parameters, and then initializing a chatbot chain with specific settings, using OpenAI’s GPT-3.5 Turbo model, and a vector database for retrieval.
def create_qa(llm_model, temperature, db, search_type, chain_type, k):
# Define Retriever
retriever = db.as_retriever(search_type=search_type, search_kwargs={"k": k}) #Create a chatbot chain. Memory is managed externally.
qa = ConversationalRetrievalChain.from_llm(
llm=ChatOpenAI(model_name=llm_model, temperature=temperature),
chain_type=chain_type,
retriever=retriever,
return_source_documents=True,
return_generated_question=True,
) return qaembeddings = OpenAIEmbeddings()
handbook_vectordb = Chroma(persist_directory=persist_directory, embedding_function=embeddings)llm_model = "gpt-3.5-turbo"
temperature = 0
search_type = "mmr"
chain_type = "stuff"handbook_qa = create_qa(llm_model, temperature, handbook_vectordb, search_type, chain_type, k=5)
In the code implementation, the Conversational Retriever Chain serves as the foundation for constructing a chatbot that engages in conversations based on retrieved documents.
This chain follows a three-step process:
- Standalone Question Creation: The chat history (comprising a list of messages) and the new question are combined to form a standalone question. This step ensures that the question sent to the retrieval phase carries sufficient context without unnecessary distractions.
- Retrieval: The newly created question is fed into the retriever, which returns relevant documents.
- Response Generation: The retrieved documents are passed to a Language Model (LM). The LM generates a final response using either the new question alone (default behavior) or the original question and chat history.
The Conversational Retriever Chain’s parameterization includes:
- search_type: Determines the retrieval method which can include Similarity Search (which retrieves the top k documents with the highest similarity score) or Maximum Marginal Relevance (mmr) (which optimizes for relevant yet diverse documents).
- top k documents: Specifies the number of documents to retrieve.
- chain_type: Defines how the chain handles the top k documents obtained from the retrieval step. There are four primary chain_types:
- “Stuff”: All retrieved documents are sent to the Language Model within the same call and context window. However, when dealing with a large number of documents, they may not fit into the context window.
- “Map_Reduce”: Each document is individually processed by the Language Model to obtain an answer, and then all the answers are aggregated to derive the final response.
- “Refine”: Answers are generated from each document, and these answers are used to iteratively refine the response obtained from subsequent documents.
- “Map_Rerank”: Answers from all documents are sent to individual Language Models. These answers are ranked, and the one with the highest probability is selected as the final response.
This approach ensures flexibility in configuring the chatbot’s behavior, retrieval strategy, and response generation, allowing for tailored conversational experiences based on the provided parameters.
Interacting with a conversational chatbot to inquire about the Employee Handbook using the same questions posed in the “Problems” section.
handbook_qa({"question": "What if I get overpaid?", "chat_history": ""})['answer']
handbook_qa({"question": "What if I do not return items on termination?", "chat_history": ""})['answer']
Output of the chatbot to the questions.
Comparing the results obtained from the vector database with the context and outputs described in the “Problem” section, it becomes evident that the vector database consistently provides answers that align with those found in the Employee Handbook. In stark contrast, pre-trained Language Models (LLMs) often produce erroneous or inaccurate results in such scenarios.
Fetching source of the answers as fetched by the retriever.
Class to create a Chatbot User Interface (UI) while also implementing mechanisms for preserving and managing chat history.
import panel as pn # GUI
import parampn.extension()class cbfs(param.Parameterized):
chat_history = param.List([])
answer = param.String("")
db_query = param.String("")
db_response = param.List([]) def __init__(self, **params):
super(cbfs, self).__init__( **params)
self.panels = []
self.qa = handbook_qa def convchain(self, query):
if not query:
return pn.WidgetBox(pn.Row('User:', pn.pane.Markdown("", width=600)), scroll=True)
result = self.qa({"question": query, "chat_history": self.chat_history})
self.chat_history.extend([(query, result["answer"])])
self.db_query = result["generated_question"]
self.answer = result['answer']
self.panels.extend([
pn.Row('User:', pn.pane.Markdown(query, width=600)),
pn.Row('ChatBot:', pn.pane.Markdown(self.answer, width=600, style={'background-color': '#F6F6F6'}))
])
inp.value = '' #clears loading indicator when cleared
return pn.WidgetBox(*self.panels,scroll=True) def clr_history(self,count=0):
self.chat_history = []
self.panels = []
return
Creating a chatbot and interacting with it.
cb = cbfs()button_clearhistory = pn.widgets.Button(name="Clear History", button_type='warning')
button_clearhistory.on_click(cb.clr_history)
inp = pn.widgets.TextInput( placeholder='Enter text here…')conversation = pn.bind(cb.convchain, inp)tab1 = pn.Column(
pn.panel(conversation, loading_indicator=True, height=600),
pn.layout.Divider(),
pn.Row(inp),
pn.layout.Divider(),
pn.Row( button_clearhistory, pn.pane.Markdown("Clears chat history. Can use to start a new topic" ))
)
dashboard = pn.Column(
pn.Row(pn.pane.Markdown('# Chat with PDF')),
pn.Tabs(('Chat', tab1))
)
dashboard
Output of the Chatbot UI
In this blog, we’ve delved into the limitations of pre-trained LLMs and explored various ways to enhance them, including fine-tuning and Retrieval Augmented Generation (RAG). We’ve also taken a deep dive into RAG and vector databases, discussing how they can seamlessly integrate with organizational data and demonstrated their practical application by building a chatbot within the LangChain framework.
But our journey doesn’t end here. In our next installment of this series, we will venture even further into the realm of vector databases. Specifically, we’ll be examining a range of operations within vector databases using Pinecone. This exploration promises to unlock a wealth of possibilities for applications in this domain. So, be sure to stay tuned for an exciting exploration of the powerful synergy between LLMs, LangChain, and vector databases! The future of data-driven solutions is bright, and we can’t wait to show you more.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!
Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Discover Your Dream AI Career at Towards AI Jobs
Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!
Note: Content contains the views of the contributing authors and not Towards AI.