Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Unlock the full potential of AI with Building LLMs for Productionβ€”our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

Building a Smart Chatbot with OpenAI and Pinecone: A Simple Guide
Latest   Machine Learning

Building a Smart Chatbot with OpenAI and Pinecone: A Simple Guide

Author(s): Abhishek Chaudhary

Originally published on Towards AI.

This article shows you how to build a simple RAG chatbot in Python using Pinecone for the vector database and embedding model, OpenAI for the LLM, and LangChain for the RAG workflow.

Hallucinations

Large Language Model (LLM)-based chatbots, especially those utilizing Generative AI (GenAI), are incredibly powerful tools for answering a broad range of questions. They are trained on vast amounts of text data and are capable of generating coherent and contextually relevant responses. However, a limitation arises when these models are asked questions involving specific or private information that they were not trained on. In such cases, the chatbot may produce responses that are fluent and confident but factually incorrect. This phenomenon, known as β€œhallucination,” is a common challenge in LLM applications.

https://circleci.com/blog/llm-hallucinations-ci

Hallucination occurs because LLMs generate responses based on patterns in their training data rather than retrieving facts from a specific knowledge source. If the model hasn’t been trained on the necessary information, especially when dealing with private or domain-specific data, it fills in the gaps with educated guesses. While these answers may appear plausible, they can often be misleading or entirely inaccurate.

How RAG works?

https://pub.towardsai.net/rag-explained-key-component-in-large-language-model-llm-e9b8e2083a45

To address this issue, a framework known as Retrieval-Augmented Generation (RAG) has emerged. RAG enhances the capabilities of LLMs by integrating external knowledge sources, effectively reducing the likelihood of hallucinations. Instead of relying solely on the model’s pre-trained knowledge, RAG works by retrieving relevant information from an external database β€” often a vector database such as Pinecone β€” containing the latest or private data.

In the RAG framework, when a question is asked, the system first retrieves relevant information from the external database based on semantic similarity. This information is then provided to the LLM, which uses it as context to generate a more accurate and informed response. By augmenting the generative process with real-time access to pertinent data, RAG significantly reduces the risk of hallucination and ensures that the model’s answers are grounded in fact.

For more information about RAG refer: https://research.ibm.com/blog/retrieval-augmented-generation-RAG

Prerequisites

Before we begin ensure you have the following:

Let’s install the required packages using the following:

! pip install \
"pinecone[grpc]" \
"langchain-pinecone" \
"langchain-openai" \
"langchain-text-splitters" \
"langchain" \
"jupyter" \
"python-dotenv"

Setup environment variables for your Pinecone and OpenAI API keys in .env file and read them as mentioned below

# Content of .env file
PINECONE_API_KEY="<your Pinecone API key>" # available at app.pinecone.io
OPENAI_API_KEY="<your OpenAI API key>" # available at platform.openai.com/api-keys
import os
from dotenv import load_dotenv
load_dotenv()
PINECONE_API_KEY = os.getenv('PINECONE_API_KEY')
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')

Store knowledge in Pinecone

We’ll work with a document describing a fictional product called the β€œWonderVector5000,” which is something that Large Language Models (LLMs) have no prior knowledge of. To enable the LLM to effectively answer questions about this product, we’ll first need to create a Pinecone index.

Next, we’ll use the LangChain framework to break the document into smaller, manageable segments, a process known as β€œchunking.”

After chunking the document, we’ll generate vector embedding for each of these segments using Pinecone’s inference capabilities. These vector embeddings capture the semantic meaning of the text and will serve as the key to retrieving accurate information.

Finally, we’ll upsert (i.e., insert or update) these vector embeddings into your Pinecone index, making the data readily available for retrieval whenever the LLM needs to access specific information about the WonderVector5000.

By following these steps, we’ll create a system where the LLM can refer to this external knowledge source to provide accurate responses, even for topics it wasn’t originally trained on. Let’s get started :

Create a server less index in Pinecone for storing the embeddings of your document

Let’s create a serverless index in Pinecone to store the embeddings of our document. When setting up the index, we’ll need to configure two important parameters: the index dimensions and the distance metric. These should be aligned with the specifications of the multilingual-e5-large model, which we'll be using to generate the embeddings.

https://docs.pinecone.io/models/multilingual-e5-large

The index dimensions define the size of the vector representation produced by the model, while the distance metric determines how the similarity between vectors will be measured.

By ensuring these settings match the model’s characteristics, we’ll enable accurate and efficient retrieval of relevant information from the embeddings stored in our index.

# Imports
from pinecone.grpc import PineconeGRPC as Pinecone
from pinecone import ServerlessSpec
import os
pc = Pinecone(api_key=PINECONE_API_KEY)
index_name = "docs-rag-chatbot"
if not pc.has_index(index_name):
pc.create_index(
name=index_name,
dimension=1024,
metric="cosine",
spec=ServerlessSpec(
cloud="aws",
region="us-east-1"
)
)
# Read mardown file
f = open("WonderVector5000.MD", "r")
markdown_document = f.read()
markdown_document[:100]
'# The WonderVector5000: A Journey into Absurd Innovation\n\n## Introduction\n\nWelcome to the whimsical '

Divide the document into smaller chunks

Since our document is in Markdown format, we should chunk the content based on its structure to ensure that each segment is semantically coherent. This approach preserves the meaning of each section and makes it easier to retrieve relevant information later. After chunking the content, we’ll use Pinecone Inference to generate embeddings for each chunk. Finally, we’ll upsert these embeddings into our Pinecone index, making them accessible for future retrieval and enhancing the LLM’s ability to answer questions accurately.

# Imports
from langchain_pinecone import PineconeEmbeddings
from langchain_pinecone import PineconeVectorStore
from langchain_text_splitters import MarkdownHeaderTextSplitter
import os
import time
# Specify markdown splitter fields
headers_to_split_on = [
("##", "Header 2")
]

markdown_splitter = MarkdownHeaderTextSplitter(
headers_to_split_on=headers_to_split_on, strip_headers=False
)
# Split the markdown document
md_header_splits = markdown_splitter.split_text(markdown_document)
md_header_splits[0]
Document(metadata={'Header 2': 'Introduction'}, page_content="# The WonderVector5000: A Journey into Absurd Innovation \n## Introduction \nWelcome to the whimsical world of the WonderVector5000, an astonishing leap into the realms of imaginative technology. This extraordinary device, borne of creative fancy, promises to revolutionize absolutely nothing while dazzling you with its fantastical features. Whether you're a seasoned technophile or just someone looking for a bit of fun, the WonderVector5000 is sure to leave you amused and bemused in equal measure. Let's explore the incredible, albeit entirely fictitious, specifications, setup process, and troubleshooting tips for this marvel of modern nonsense.")

Embed the chunks

For this first we need to define the embedding model and then embed each chunk then upsert the embedding in pinecone index

# Initialize a LangChain embedding object.
model_name = "multilingual-e5-large"
embeddings = PineconeEmbeddings(
model=model_name,
pinecone_api_key=PINECONE_API_KEY
)
embeddings
PineconeEmbeddings(model='multilingual-e5-large', batch_size=96, query_params={'input_type': 'query', 'truncation': 'END'}, document_params={'input_type': 'passage', 'truncation': 'END'}, dimension=1024, show_progress_bar=False, pinecone_api_key=SecretStr('**********'))
# Embed each chunk and upsert the embeddings into your Pinecone index.
docsearch = PineconeVectorStore.from_documents(
documents=md_header_splits,
index_name=index_name,
embedding=embeddings,
namespace="wondervector5000"
)
docsearch
<langchain_pinecone.vectorstores.PineconeVectorStore at 0x7fd6c3ad4310>

Query Pinecone to view the chunks

Using Pinecone’s list and query operations we can look at one of the records

index = pc.Index(index_name)
namespace = "wondervector5000"
for ids in index.list(namespace=namespace):
query = index.query(
id=ids[0],
namespace=namespace,
top_k=1,
include_values=True,
include_metadata=True
)
print(query)
{'matches': [{'id': '7d593c7b-7580-43ca-a1eb-084a001c27ed',
'metadata': {'Header 2': 'Product overview',
'text': '## Product overview \n'
'The WonderVector5000 is packed with '
'features that defy logic and physics, each '
'designed to sound impressive while '
'maintaining a delightful air of '
'absurdity: \n'
'- Quantum Flibberflabber Engine: The heart '
'of the WonderVector5000, this engine '
'operates on principles of quantum '
'flibberflabber, a phenomenon as mysterious '
"as it is meaningless. It's said to harness "
'the power of improbability to function '
'seamlessly across multiple dimensions. \n'
'- Hyperbolic Singularity Matrix: This '
'component compresses infinite '
'possibilities into a singular hyperbolic '
'state, allowing the device to predict '
'outcomes with 0% accuracy, ensuring every '
'use is a new adventure. \n'
'- Aetherial Flux Capacitor: Drawing energy '
'from the fictional aether, this flux '
'capacitor provides unlimited power by '
'tapping into the boundless reserves of '
'imaginary energy fields. \n'
'- Multi-Dimensional Holo-Interface: '
'Interact with the WonderVector5000 through '
'its holographic interface that projects '
'controls and information in '
'three-and-a-half dimensions, creating a '
"user experience that's simultaneously "
'futuristic and perplexing. \n'
'- Neural Fandango Synchronizer: This '
'advanced feature connects directly to the '
"user's brain waves, converting your "
'deepest thoughts into tangible '
'actionsβ€”albeit with results that are '
'whimsically unpredictable. \n'
'- Chrono-Distortion Field: Manipulate time '
"itself with the WonderVector5000's "
'chrono-distortion field, allowing you to '
'experience moments before they occur or '
'revisit them in a state of temporal flux.'},
'score': 1.0,
'sparse_values': {'indices': [], 'values': []},
'values': [0.030090332,
0.0046539307,
...]}],
'namespace': 'wondervector5000',
'usage': {'read_units': 6}}

Use Chat bot

Now that our document is stored as embeddings in Pinecone, we can enhance the accuracy of the LLM’s responses by retrieving relevant knowledge from our Pinecone index when we send it questions. This ensures that the LLM has access to the specific information it needs to generate precise answers.

Next, we’ll initialize a LangChain object to interact with the GPT-3.5-turbo LLM. We'll define a few questions about the fictional WonderVector5000 product and send them to the LLM twice: first with relevant knowledge retrieved from Pinecone, and then without any additional knowledge. This will allow us to compare the quality and accuracy of the responses in both scenarios.

# Imports
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
# Initialize a LangChain object for chatting with the LLM
# without knowledge from Pinecone.
llm = ChatOpenAI(
openai_api_key=OPENAI_API_KEY,
model_name='gpt-3.5-turbo',
temperature=0.0
)
# Initialize a LangChain object for chatting with the LLM
# with knowledge from Pinecone.
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=docsearch.as_retriever()
)
# Define a few questions about the WonderVector5000.
query1 = """What are the first 3 steps for getting started
with the WonderVector5000?"""


query2 = """The Neural Fandango Synchronizer is giving me a
headache. What do I do?"""
# Send each query to the LLM twice, first with relevant knowledge from Pincone 
# and then without any additional knowledge.
print("Query 1\n")
print("Chat with knowledge:")
print(qa.invoke(query1).get("result"))
print("\nChat without knowledge:")
print(llm.invoke(query1).content)
Query 1

Chat with knowledge:
The first three steps for getting started with the WonderVector5000 are:

1. Unpack the Device: Remove the WonderVector5000 from its anti-gravitational packaging, ensuring to handle with care to avoid disturbing the delicate balance of its components.
2. Initiate the Quantum Flibberflabber Engine: Locate the translucent lever marked β€œQFE Start” and pull it gently. You should notice a slight shimmer in the air as the engine engages, indicating that quantum flibberflabber is in effect.
3. Calibrate the Hyperbolic Singularity Matrix: Turn the dials labeled "Infinity A" and "Infinity B" until the matrix stabilizes. You’ll know it's calibrated correctly when the display shows a single, stable β€œβˆžβ€.

Chat without knowledge:
1. Unpack the WonderVector5000 and familiarize yourself with all the components and accessories included in the package.
2. Read the user manual thoroughly to understand the setup process, safety precautions, and operating instructions.
3. Connect the WonderVector5000 to a power source and follow the instructions in the manual to calibrate the device and start using it for your desired applications.
print("\nQuery 2\n")
print("Chat with knowledge:")
print(qa.invoke(query2).get("result"))
print("\nChat without knowledge:")
print(llm.invoke(query2).content)
Query 2

Chat with knowledge:
To address the headache caused by the Neural Fandango Synchronizer, ensure that the headband is correctly positioned and not too tight on your forehead. Additionally, try to relax and focus on simple, calming thoughts to ease the synchronization process.

Chat without knowledge:
If the Neural Fandango Synchronizer is giving you a headache, it is important to stop using it immediately and give yourself a break. Take some time to rest and relax, drink plenty of water, and consider taking over-the-counter pain medication if needed. If the headache persists or worsens, it is recommended to consult a healthcare professional for further evaluation and advice. Additionally, you may want to consider adjusting the settings or usage of the Neural Fandango Synchronizer to see if that helps alleviate the headache.

For each query, you’ll notice a clear difference in the quality of the responses. The first response, which incorporates relevant knowledge from the Pinecone index, provides highly accurate information that closely matches the content of the WonderVector5000 document. On the other hand, the second response, generated without additional knowledge from Pinecone, may sound convincing but tends to be generic and often inaccurate. This contrast highlights the importance of augmenting the LLM with external data for precise and contextually relevant answers.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓