The LLM Series #4: Mastering Vector and Semantic Search via Azure
Last Updated on June 4, 2024 by Editorial Team
Author(s): Muhammad Saad Uddin
Originally published on Towards AI.
Welcome to the fourth article of the LLM Series, where we continue our journey into the cutting-edge realm of large language models (LLMS) and their diverse applications 🌟. In this edition, weβre implementing pillars of vector and semantic search, leveraging the powerful capabilities of Azure. Weβll explore how mastering these advanced yet simple search techniques can significantly enhance the functionality and performance of Large Language Models. From improving search accuracy to optimizing query responses, this article will uncover how Microsoft Azureβs robust infrastructure is transforming the way we interact with data. Get ready to unlock new levels of efficiency and precision in your AI endeavors. So grab your gear and get ready for a deep dive into the world of vector and semantic search β a place where every query finds its perfect match 🔍✨.
So, letβs start with how this article proceeds. Hereβs the sneak peak so you can jump directly to the sections that interest you the most:
- Importing Libraries and Datasets: We start by importing all the necessary libraries and datasets for our project.
- Creating Embeddings: Next, we use the JSON data to create embeddings.
- Building the Index in Azure: We proceed to build the index in Azure, adding fields for the index.
- Configuring Search: We configure both semantic and vector searches.
- Creating and Uploading the Index: We create the index, upload the data, and verify its completeness.
- Testing: We conduct tests to ensure both semantic and vector search works as expected.
- Using GPT-4 for User Queries: Finally, we use GPT-4 to answer user queries in the best possible way.
Importing Libraries and Dataset:
We begin by importing the following packages. Iβve included the versions of some packages that were current at the time of writing. This is due to significant changes in the Azure packages; newer versions may behave differently than the ones used presently.
#version used for this article
azure-core==1.29.3
azure-search-documents==11.4.0b8
import os
import re
import json
from openai import AzureOpenAI
from dotenv import load_dotenv
from tenacity import retry, wait_random_exponential, stop_after_attempt
import tiktoken
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.models import Vector
from azure.search.documents.indexes.models import (
SearchIndex,
SearchField,
SearchFieldDataType,
SimpleField,
SearchableField,
SearchIndex,
SemanticConfiguration,
PrioritizedFields,
SemanticField,
SearchField,
SemanticSettings,
VectorSearch,
HnswVectorSearchAlgorithmConfiguration,
)
before moving forward, let me explain some of the imports like tenacity, which we will use while creating embeddings.
tenacity is used to handle retries with exponential backoff when generating embeddings. This is useful in scenarios where API calls or processing steps might fail intermittently, and a retry mechanism is needed to improve robustness. More info here
Components:
retry
: A decorator to mark a function for automatic retries on failure.wait_random_exponential
: Specifies an exponential backoff strategy for retry attempts, with randomized wait times to prevent thundering herd problems.stop_after_attempt
: Limits the number of retry attempts to avoid infinite loops.
Next, lets talk briefly about Azure core and search imports.
These imports are used to interact with Azure Cognitive Search for semantic and vector search operations. Azure Cognitive Search provides capabilities for building advanced search solutions with features like full text search, faceted navigation, and, more recently, vector search for embedding based retrieval. More info on Azure
Components:
AzureKeyCredential
: Manages the API key for authenticating requests to Azure services.SearchClient
: Allows querying an index and performing document operations like uploading, merging, or deleting documents.SearchIndexClient
: Manages search indexes, including creating, updating, or deleting indexes.Vector
andSearchIndex
related classes: Define the structure and configuration of search indexes, including fields for vector embeddings and settings for semantic configurations.
Indexes:
SearchIndex
: Represents the configuration of a search index, use to define and create a new index or update an existing one.SearchField
: Use to define fields that can be searched, filtered, or used for full-text search.SearchFieldDataType
: Use to specify the type of data a field holds (e.g., String, Int32, Double).SimpleField
: A simplified version ofSearchField
for fields that do not need to be searchable or facetable.SearchableField
: A type ofSearchField
specifically for fields that should be searchable and tokenized. Used to define fields that will be included in full-text search operations.SemanticConfiguration
andSemanticSettings
: Configures semantic search capabilities for the index.PrioritizedFields
: Use to specify which fields should be given more importance in semantic search results.SemanticField
: Used to identify specific fields that contribute to the semantic ranking of search results.VectorSearch
: Configures vector search capabilities for the index using embeddings.HnswVectorSearchAlgorithmConfiguration
: Configuration for the HNSW (Hierarchical Navigable Small World) algorithm used in vector search.
We took Wikipedia dataset for this project, which is available on the datasets package of Hugging Face.
#!pip install datasets
from datasets import load_dataset
load_dataset("wikipedia", "20220301.en")
{'id': '1',
'url': 'https://en.wikipedia.org/wiki/DATA_science',
'title': 'Data Science',
'text': 'Data science is an interdisciplinary academic field...'
}
Some stats of β20220301.enβ version of dataset:
20220301.en
- Size of downloaded dataset files: 11.69 GB
- Size of the generated dataset: 20.28 GB
- Total amount of disk used: 31.96 GB
Data Fields
The data fields are the same among all configurations:
id
(str
): ID of the article.url
(str
): URL of the article.title
(str
): Title of the article.text
(str
): Text content of the article.
since the dataset is too big we will just select random 1000 article from it.
Creating Embeddings:
After importing the necessary libraries and obtaining the required data, we define a function to create embeddings for the text of each Wikipedia article. In this example, we focus solely on the text field. But, for a more complex application involving thousands of documents, I will advise you all to also create embeddings for the titles and relevant metadata too. Additionally, in some cases, breaking the data into smaller chunks can be an effective strategy for managing large datasets. (I will soon share my findings in upcoming writeup for chunking best practices and strategies)
@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6))
def generate_embeddings(text: str):
client = AzureOpenAI(
api_key="# Azure OpenAI api key",
azure_endpoint = "# Azure OpenAI endpoint",
api_version = "# Azure OpenAI version"
)
response = client.embeddings.create(
input=text, model="text-embedding-ada-002")
embeddings = response.data[0].embedding
return embeddings
The generate_embeddings
function takes text as input and calls the OpenAI API for the "text-embedding-ada-002" model to obtain embeddings. Given that our data is in JSON format, we will extract the text for each row, create embeddings, and store them as a new key-value pair under 'textVector'. I would suggest here to save this JSON file so that if you need to update the index or use it for a new application, you won't have to generate the embeddings repeatedly.
# Generate embeddings for title and content fields
for item in wiki_data:
content = item['text']
content_embeddings = generate_embeddings(content)
item['textVector'] = content_embeddings
# Output embeddings to docVectors.json file
with open(r'C:\wiki_docs\wiki_vector.json', "w") as f:
json.dump(wiki_data, f)
Building the Index in Azure:
Now we grab credentials from Azure Cognitive Search Resource available on our Azure search portal
service_endpoint = "Azure Search Service Endpoint"
key = "Azure Search Service key"
credential = AzureKeyCredential(key)
In the below step we build a search index in Azure Cognitive Search. An index in Azure Cognitive Search is similar to a database table but optimized for search operations. It defines the structure of the data and the search capabilities available.
# Create a search index
index_client = SearchIndexClient(
endpoint=service_endpoint, credential=credential)
fields = [
SimpleField(name="ID", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True, facetable=True),
SearchableField(name="URL", type=SearchFieldDataType.String, Searchable=False, retrievable=False),
SearchableField(name="title", type=SearchFieldDataType.String, Searchable=False, retrievable=False),
SearchableField(name="text", type=SearchFieldDataType.String, Searchable=True, facetable=True, retrievable=True),
SearchField(name="textVector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
searchable=True, vector_search_dimensions=1536, vector_search_configuration="wiki-vector-config"),
]
This code initializes a client to interact with Azure Cognitive Search. It defines the schema for a search index, specifying various fields and their properties. The fields include an ID, URL, title, text content, and a vector field for embeddings. The configuration allows for sorting, filtering, faceting, and vector-based search operations. This setup enables complex search functionalities, including traditional keyword based searches and modern vector based searches, providing a flexible and powerful search solution.
Configuring Search:
In this step, we configure vector and semantic search. In the first step, we configure a vector search for an Azure Cognitive Search index using the HNSW algorithm. It sets the parameters for constructing and searching the HNSW graph, optimizing the balance between search accuracy and performance. The configuration is named "wiki-vector-config"
and uses cosine similarity as the metric for comparing vectors. This setup enables efficient and accurate nearest-neighbor searches based on vector embeddings.
vector_search = VectorSearch(
algorithm_configurations=[
HnswVectorSearchAlgorithmConfiguration(
name="wiki-vector-config",
kind="hnsw",
parameters={
"m": 10,
"efConstruction": 400,
"efSearch": 500,
"metric": "cosine"
}
)
]
)
A little deep dive in HNSW Algorithm Configuration might be useful here:
Parameters:
m
:10
– This parameter controls the number of bi-directional links created for each element during the construction of the graph.efConstruction
:400
– This parameter affects the quality and performance during the construction phase of the graph. A higher value leads to a better quality graph but slower construction.efSearch
:500
– This parameter affects the search performance. A higher value increases search accuracy but also increases the search time.metric
:"cosine"
– Specifies the distance metric used to compare vectors. Here, the cosine similarity metric is used, which measures the cosine of the angle between two vectors.
Next, we configure our semantic configuration. Semantic search enhances traditional search by understanding the context and meaning of the terms used, providing more relevant and accurate results.
semantic_config = SemanticConfiguration(
name="wiki-semantic-config",
prioritized_fields=PrioritizedFields(
title_field=SemanticField(field_name="Title"),
prioritized_keywords_fields=[SemanticField(field_name="text")],
)
)
# Create the semantic settings with the configuration
semantic_settings = SemanticSettings(configurations=[semantic_config])
This setup allows the search index to leverage semantic search capabilities, providing more contextually relevant search results by prioritizing specific fields.
Combining vector and semantic search in Azure Cognitive Search allows leveraging the strengths of both methods to enhance search accuracy and relevance. Vector search captures the semantic meaning of documents, enabling the retrieval of contextually similar content even without exact keyword matches. Semantic search improves the precision of results by understanding the context and prioritizing important fields within the documents. This dual approach ensures a comprehensive and effective search experience, handling natural language queries more intuitively and accurately.
Creating and Uploading the Index:
Here, we assign a name to the search index and verify its creation.
# Create the search index with the semantic settings
index_name = "wiki-search"
index = SearchIndex(name=index_name, fields=fields,
vector_search=vector_search, semantic_settings=semantic_settings)
result = index_client.create_or_update_index(index)
print(f' {result.name} created')
wiki-search created
Next, we upload our JSON file containing embeddings, where each row corresponds to a document. This process involves converting the document text into vector representations, allowing for efficient and accurate search operations.
search_client = SearchClient(endpoint=service_endpoint, index_name=index_name, credential=credential, connection_verify=False)
result = search_client.upload_documents(wiki_data)
print(f"Uploaded {len(wiki_data)} documents")
Uploaded 1000 documents
Testing:
It is time to test the performance of semantic and vector search. Initially, we will conduct tests using a purely vector-based search approach to evaluate its effectiveness.
# Pure Vector Search
query = "What is data science"
search_client = SearchClient(service_endpoint, index_name, credential=credential, connection_verify=False)
vector = Vector(value=generate_embeddings(query), k=2, fields="ContentVector")
results = search_client.search(
search_text=None,
vectors= [vector],
select=["URL", "title", 'text'],
)
for result in results:
print(f"Title: {result['title']}")
print(f"Score: {result['@search.score']}")
print(f"URL: {result['URL']}\n")
print(f"summary: {result['text']}\n")
Title: "Data science"
Score: 0.8798066
URL: "https://en.wikipedia.org/wiki/Data_science"
summary: "Data science is an interdisciplinary academic field that uses......"
Title: "Master in Data Science"
Score: 0.8018056
URL: "https://en.wikipedia.org/wiki/Master_in_Data_Science"
summary: "A Master of Science in Data Science is an interdisciplinary degree program designed......"
As you can see, we are already achieving quite good results with our current setup. Now, letβs try implementing a hybrid semantic vector search to potentially enhance the search performance even further.
#Hybrid semantic Vector Search
query = "Define Artificial Intelligence (A.I.)"
search_client = SearchClient(service_endpoint, index_name, credential=credential, connection_verify=False)
vector = Vector(value=generate_embeddings(query), k=3, fields="ContentVector")
results = search_client.search(
search_text=query,
vectors= [vector],
select=["URL", "title", 'text'],
query_type="semantic", query_answer="extractive", query_caption="extractive|highlight-true", query_language="en-us", semantic_configuration_name='wiki-semantic-config',
top=3
)
for result in results:
print(f"Title: {result['title']}")
print(f"Score: {result['@search.score']}")
print(f"URL: {result['URL']}\n")
print(f"summary: {result['text']}\n")
Title: "Artificial intelligence"
Score: 0.8297456
URL: "https://en.wikipedia.org/wiki/Artificial_intelligence"
summary: "Artificial intelligence (AI), in its broadest sense, is intelligence exhibited......"
Title: "Artificial general intelligence"
Score: 0.7797136
URL: "https://en.wikipedia.org/wiki/Artificial_general_intelligence"
summary: "Artificial general intelligence (AGI) is a type of artificial intelligence (AI) that......"
Title: "A.I._Artificial_Intelligence"
Score: 0.7317047
URL: "https://en.wikipedia.org/wiki/A.I._Artificial_Intelligence"
summary: "A.I. Artificial Intelligence (or simply A.I.) is a 2001 American science fiction film......"
So, what we achieve here is a collection of relevant Wikipedia pages based on our question. However, we still need to manually review these pages to find the precise answer. What if we could automate this process using the power of LLMs? By integrating LLMs, we can extract specific answers directly from the retrieved documents, providing more precise and concise responses to user queries. This not only enhances the efficiency of our search system but also significantly improves the user experience by delivering targeted information with minimal effort.
But letβs also take a look at the semantic search results since we enabled it for this case:
semantic_answers = results.get_answers()
for answer in semantic_answers:
print(f"Semantic Answer highlights: {answer.highlights}")
print(f"Semantic Answer Text: {answer.text}")
print(f"Semantic Answer Score: {answer.score}\n")
Semantic Answer highlights: Artificial intelligence is intelligence exhibited by machines, particularly computer systems.
Semantic Answer Text: Title: "Artificial intelligence" URL: "https://en.wikipedia.org/wiki/Artificial_intelligence" summary: "Artificial intelligence (AI), in its broadest sense....."
Semantic Answer Score: 0.813421875
The results seem good but not exactly what weβre looking for. To optimize results and enhance the user experience, letβs explore how we can improve this using the powerful capabilities of LLMs.
Using GPT-4 for User Queries:
To use the full potential of LLMs, we create function: get_chat_response
. This function, get_chat_response
, takes in system_prompt
, context
and ques
as parameters, along with an optional engine
parameter which defaults to "gpt-4"
. It uses these inputs to call the OpenAI API and returns the sensible response from given context
.
system_prompt = '''You are a Wiki Doc Assistant'''
def get_LLM_response(system_prompt, context, ques, engine="gpt-4"):
client = AzureOpenAI(
api_key="Key",
api_version="version",
azure_endpoint="Endpoint"
)
response = client.chat.completions.create(
engine=engine,
messages=[
{"role": "system", "content": f"{system_prompt}"},
{"role": "user", "content": f"data: here's the data for context: {text1}
and here is user question: {ques} "}
]
)
resp = response['choices'][0]['message']['content']
return resp
We will now test it with our second example of hybrid search output:
print(get_LLM_response(system_prompt, context, ques, engine="gpt-4"))
Artificial Intelligence (AI) refers to the simulation of human intelligence processes by machines, especially computer systems.
In a broader sense, its the intelligence exhibited by machines or software.
Its a field of research within computer science that is devoted to creating and studying methods
and software that enable machines to understand their environment.
These machines, often referred to as AIs, use learning and intelligence to take actions that optimize their chances of achieving set goals.
As you can see, the results from the LLM-based response are effective, precise, and meet user expectations. This is why using LLMs is important on top of vector and/or semantic searches. The integration of LLMs enhances the search process by not only retrieving relevant documents but also summarizing and extracting precise information.
The summary of what we did here is as follows: we used Wikipedia text to create embeddings, stored them in an Azure search index, and utilized both vector and semantic search to retrieve relevant text. Finally, we employed an LLM (GPT-4 in our case) to refine and optimize the output, ensuring the best user experience. This combined strategy leverages the strengths of each component, resulting in highly relevant and precise search results. By integrating the power of Wikipediaβs comprehensive information, Azureβs efficient search index, advanced vector search methodologies, and the capabilities of GPT-4, we were able to create a robust and user-friendly RAG solution. This RAG solution not only fetches relevant data but also refines it for optimal readability and relevance, thereby significantly enhancing the overall user experience.
Thatβs it for today, But rest assured, our journey is far from over! In the next chapter of the LLM series, we will develop the RAG application with in in-memory index for faster prototyping. If this guide has sparked your curiosity and you are keen on exploring more intriguing projects within this LLM series, make sure to follow me. With each new project, I promise a journey filled with learning, creativity, and fun. Furthermore:
- 👏 Clap for the story (50 claps) to help this article be featured
- 🔔 Follow Me: LinkedIn |Medium | Website
- 🌟 Need help in converting these prototypes to products? Contact Me!
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI