Vector Search Techniques for AI in Pinecone
Last Updated on September 14, 2025 by Editorial Team
Author(s): Leapfrog Technology
Originally published on Towards AI.
Written by Aayush Karn & Aayush Shrestha
In the fast-evolving world of Artificial Intelligence (AI) and Generative AI, understanding the nuances of various search techniques is crucial. Our previous blogs introduced the concept of vector search within Retrieval-Augmented Generation (RAG) and explored the operational aspects of Pinecone. These discussions laid the groundwork for understanding how modern AI-driven search methods are transforming data retrieval.
RAG combines the strengths of retrieval-based and generative models to provide accurate and contextually relevant information. Pinecone, as a state-of-the-art vector database, plays a pivotal role in this process by efficiently handling and querying large-scale vector data. This synergy between RAG and Pinecone underscores the need for advanced vector search capabilities to enhance search relevance and efficiency.
In this blog, we’ll delve into the different search techniques used in Pinecone and why they matter.
Vector search leverages machine learning to transform unstructured data, such as text and images, into numerical vectors that encapsulate their semantic meaning and context. Unlike traditional keyword-based search methods that rely on exact word matches, vector search identifies data points with similar meanings using these embeddings. This significantly enhances search relevance and efficiency through the use of approximate nearest-neighbor algorithms, which quickly locate comparable data points.
This technique is particularly valuable when users have a conceptual understanding of what they’re looking for but may not know the exact terms. By focusing on meaning rather than exact matches, vector search greatly improves the ability to find relevant information across various unstructured data types, including text, images, and multimedia content.
Now, let’s explore the specific search techniques that Pinecone utilizes to make these advanced capabilities possible, and how they integrate with AI and generative AI to revolutionize data retrieval.
Types of vector search techniques
Keyword or lexical search
The foundation of a keyword or lexical search is matching words or phrases exactly as they occur in a query with those found in the documents. Although this method is quick and easy to use, it has some shortcomings. It might not be able to handle polysemy, which is the situation where a word has more than one meaning, synonyms, or misspellings. Furthermore, it ignores the content and context of the words, which may produce findings that are not relevant.
Semantic search
Semantic search, on the other hand, analyzes word meaning and associations using natural language processing (NLP) approaches. Words are represented as vectors in a high-dimensional space, and the semantic similarity between vectors is shown by their distance from one another. This method is able to handle polysemy, synonyms, misspellings, and more complex word associations including antonyms, hypernyms, and meronyms. It can therefore yield more precise and pertinent results.
For instance, let us consider “chocolate milk.” A semantic search engine will distinguish between the terms “chocolate milk” and “milk chocolate.” Though the query’s keywords are the same, the sequence in which they are written influences the meaning. As humans, we recognize that milk chocolate is a type of chocolate, whereas chocolate milk is chocolate-flavored milk.
Furthermore, a search for “football” might return “soccer” in the United States and “football” in Nepal and other parts of the world. Semantic search would differentiate results based on the user’s geographic location.

Hybrid search
We can improve the search experience by integrating vector search with filtering and aggregation, as well as adopting hybrid search and traditional scoring.
Hybrid search combines semantic and keyword searches in one query to produce more relevant results. Out-of-domain semantic search results may be less relevant; however, integrating them with keyword search results might increase relevance.
We’ll discuss Hybrid search in Pinecone in detail later in the blog.

Implementation of semantic search in Pinecone
In this example, you can do the semantic search only using the Pinecone vector. For vectorizing the text you can use the OpenAI embedding feature and upsert the vector into Pinecone.
embeddings = OpenAIEmbeddings()
PDF text is vectorized using the OpenAI embedding function.
Pinecone.init(api_key="your Pinecone api", environment="us-west4-gcp-free")
index_name = 'langdemo'
After the vector embedding is done, a Pinecone connection is created with the index name langdemo.
if index_name not in Pinecone.list_indexes():
# we create a new index
Pinecone.create_index(
name=index_name,
metric='cosine',
dimension=1536
)
After the index is created with a vector distance metric, like in this example, cosine is used with dimension 1536. Upsert of the vector is performed.
vectorstore = Pinecone.from_existing_index(index_name=index_name, embedding=embeddings
Reading the data from the Pinecone index as index name and embedding parameter are sent.
query = "Type of holiday in office?"
vectorstore.similarity_search(
query,
k=1)

Finally, the Pinecone query is to identify the relevant answer for the query using similarity_search function provided by Pinecone without using langchain RAG function.
Integrating Pinecone with LangChain for semantic search
In this demonstration, you can find the integration of LangChain and Pinecone with LLM. Pinecone is used for storing the vector embeddings and LangChain is used for converting the PDF documents to vector embeddings.
First, set up LangChain and Pinecone as shown below.
!pip install Pinecone-client openai tiktoken langchain
!pip install pypdf
After that, set up the OpenAI module
import os
import getpassimport openai
Then,
# OpenAI credentials
os.environ['OPENAI_API_KEY'] = 'Insert Your OPENAI KEY'
openai.api_key = os.environ['OPENAI_API_KEY']
# Pinecone credentials
Pinecone_API_KEY = os.environ.get("Insert your API KEY")
Pinecone_API_ENV = os.environ.get("us-west4-gcp-free")
#LLM model
llm_model = "gpt-3.5-turbo"
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter,RecursiveCharacterTextSplitter
from langchain.vectorstores import Pinecone
from langchain.document_loaders import TextLoader
from langchain.document_loaders import PyPDFLoader
import Pinecone
from langchain.schema import document
pdf = "PATH OF YOUR PDF DOCUMENT"
loader = PyPDFLoader(pdf)
document =loader.load()
Load the PDF document from the given path into the document using PDF loader.
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=0)
docs = text_splitter.split_documents(document)
Batching loading of the PDF text is done with a chunk size of 2000, as shown in the code snippets.
embeddings = OpenAIEmbeddings()
OpenAI embeddings for vectorizing the text data.
# Initialize Pinecone
Pinecone.init(api_key="Pinecone_API_KEY", environment="us-west4-gcp-free")
index_name = 'langdemo'
if index_name not in Pinecone.list_indexes():
# we create a new index
Pinecone.create_index(
name=index_name,
metric='cosine',
dimension=1536
)
Connect to the Pinecone database and create the index using cosine distance metric with vector dimension of 1536.
docsearch = Pinecone.from_documents(docs, embeddings, index_name=index_name)
vectorstore = Pinecone.from_existing_index(index_name=index_name, embedding=embeddings)
Run the vector embedding from text documents into Pinecone
from langchain.chains import ConversationalRetrievalChain
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
qa = ConversationalRetrievalChain.from_llm(
llm=OpenAI(model_name=llm_model,temperature=0),
retriever=vectorstore.as_retriever(search_type="mmr",chain_type ="stuff",k=5),
return_source_documents=True,)
In this Q&A example you can see from the code snippets that ConversationalRetrievalChain is used with OpenAI model GPT 3.5 turbo is used with temperature equal to zero for vector search for answer search_type maximal marginal relevance is used where k =5 will look for five different results.
chat_history = []
Query “sick leave for employees?”
result = qa({“question”: query, “chat_history”: chat_history})
result[“answer”]
Output: Employees are eligible for sick leave, but the specific details and approval process may vary depending on the circumstances.
In this example, we have demonstrated how the usage of vector databases can enhance the performance of LLM. This exploration showcases its ability for storing unstructured data and producing structured results. Implementation of vector databases with LLM and LangChain framework simplifies and accelerates the development process.
Hybrid search in Pinecone
We briefly talked about hybrid search earlier in the blog, let us see how it’s implemented in Pinecone.
In Pinecone, a hybrid search is performed with sparse-dense vectors, which combine dense and sparse embeddings as a single vector. Sparse and dense vectors represent different types of information and enable distinct kinds of search.

Dense vectors
A dense vector is the most common type of vector in Pinecone. Semantic search is enabled by dense vectors. Even in cases where there are no exact matches, semantic search yields the results that are most comparable based on a particular distance metric. This is made possible by the fact that dense vectors — which are numerical representations of semantic meaning — are produced by embedding models like text-embedding-ada-002/text-embedding-3-large (OpenAIEmbeddings in Langchain code shown above).
Sparse vectors
Sparse vectors consist of a high number of dimensions with a very low percentage of non-zero values. Each sparse vector in a keyword search represents a document; the values indicate the significance of each word in the document, and the dimensions represent dictionary words. The number of keyword matches, their frequency, and other criteria are taken into account by keyword search algorithms such as the BM25 algorithm when determining the text document’s relevance.
A single sparse-dense index is used in the Pinecone method of hybrid search. It makes it possible to search through any kind of media — text, music, photos, etc. Finally, the alpha parameter allows for easy adjustment of the sparse vs. dense weighting.
Pinecone’s hybrid index takes care of everything that falls between the dots. To get there, though, we must first turn our input data into dense and sparse vector representations.
Implementation of hybrid search with Pinecone with an e-commerce project
Load dataset
We will work with a subset of the Open Fashion Product Images dataset, consisting of ~44K fashion products with images and category labels describing the products.
The dataset can be loaded from the Huggingface Datasets hub as follows:
from datasets import load_dataset
# load the dataset from huggingface datasets hub
fashion = load_dataset(
"ashraq/fashion-product-images-small",
split="train"
)
fashion
Dataset({
features: [‘id’, ‘gender’, ‘masterCategory’, ‘subCategory’, ‘articleType’, ‘baseColour’, ‘season’, ‘year’, ‘usage’, ‘productDisplayName’, ‘image’],
num_rows: 44072
})
We will first assign the images and metadata into separate variables and then convert the metadata into a pandas dataframe.
# assign the images and metadata to separate variables
images = fashion["image"]
metadata = fashion.remove_columns("image") # convert metadata into a pandas dataframe
metadata = metadata.to_pandas()
metadata.head()

We need both sparse and dense vectors to perform a hybrid search. We will use all the metadata fields except for the id and year to create sparse vectors and the product images to create dense vectors.
Sparse vectors
To create sparse vectors, we’ll use BM25. We import the BM25 function
from the Pinecone-text library.
from Pinecone_text.sparse import BM25Encoder
bm25 = BM25Encoder()
The tokenization will look something like this:
"Turtle Check Men Navy Blue Shirt".lower().split()
[‘turtle’, ‘check’, ‘men’, ‘navy’, ‘blue’, ‘shirt’]
BM25 requires training on a representative portion of the dataset. We do this like so:
bm25.fit(metadata['productDisplayName'])
0%| | 0/44072 [00:00<?, ?it/s]
<pinecone_text.sparse.bm25_encoder.BM25Encoder at 0x7f78f7851220>
Let’s create a test sparse vector using a productDisplayName.
metadata['productDisplayName'][0]
‘Turtle Check Men Navy Blue Shirt’
bm25.encode_queries(metadata['productDisplayName'][0])
{‘indices’: [23789636,
1830646559,
632192512,
931643408,
3905155331,
3828986392],
‘values’: [0.3276687848622804,
0.19377339510596148,
0.040475545164610806,
0.1808640794607714,
0.10791423980552016,
0.1493039556008558]}
And for the stored docs, we only need the “IDF” part:
bm25.encode_documents(metadata['productDisplayName'][0])
{‘indices’: [23789636,
1830646559,
632192512,
931643408,
3905155331,
3828986392],
‘values’: [0.4449638258432887,
0.4449638258432887,
0.4449638258432887,
0.4449638258432887,
0.4449638258432887,
0.4449638258432887]}
Dense vectors
We will use the CLIP embeddings to generate dense vectors for product images. CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a range of image and text pairs and is a multi-modal vision and language model. It can be used for image-text similarity and zero-shot image classification.
We can directly pass PIL images to CLIP as it can encode both images and texts. This allows us to leverage CLIP’s powerful multimodal capabilities to align visual and textual information seamlessly.
We can load CLIP like so:
from sentence_transformers import SentenceTransformer
import torch
device = 'cuda' if torch.cuda.is_available() else 'cpu'
# load a CLIP model from huggingface
model = SentenceTransformer(
'sentence-transformers/clip-ViT-B-32',
device=device
)
model
SentenceTransformer(
(0): CLIPModel()
)
dense_vec = model.encode([metadata['productDisplayName'][0]])
dense_vec.shape
(1, 512)
This model gives us a 512-dimensional dense vector.
Upsert documents
Now we can go ahead and generate sparse and dense vectors for the full dataset and upsert them along with the metadata to the new hybrid index. We can do that easily as follows:
for i in tqdm(range(0, len(fashion), batch_size)):
# find end of batch
i_end = min(i+batch_size, len(fashion))
# extract metadata batch
meta_batch = metadata.iloc[i:i_end]
meta_dict = meta_batch.to_dict(orient="records")
# concatenate all metadata field except for id and year to form a single string
meta_batch = [" ".join(x) for x in meta_batch.loc[:, ~meta_batch.columns.isin(['id', 'year'])].values.tolist()]
# extract image batch
img_batch = images[i:i_end]
# create sparse BM25 vectors
sparse_embeds = bm25.encode_documents([text for text in meta_batch])
# create dense vectors
dense_embeds = model.encode(img_batch).tolist()
# create unique IDs
ids = [str(x) for x in range(i, i_end)]
upserts = []
# loop through the data and create dictionaries for uploading documents to Pinecone index
for _id, sparse, dense, meta in zip(ids, sparse_embeds, dense_embeds, meta_dict):
upserts.append({
'id': _id,
'sparse_values': sparse,
'values': dense,
'metadata': meta
})
# upload the documents to the new hybrid index
index.upsert(upserts)
# show index description after uploading the documents
index.describe_index_stats()
0%| | 0/221 [00:00<?, ?it/s]
{‘dimension’: 512,
‘index_fullness’: 0.0,
‘namespaces’: {‘’: {‘vector_count’: 44072}},
‘total_vector_count’: 44072}
Querying
Let us query the index, providing the sparse and dense vectors. We do this directly with an equal weighting between sparse and dense setting alpha parameter as 0.5. The alpha parameter sets the weightage for each search algorithm (alpha = 0 (sparse, keyword search), alpha = 1 (dense, vector search), alpha = 0.5 (hybrid, equal weight for sparse and dense vectors)).
Weight Calculation Formula = alpha * dense + (1 — alpha) * sparse
query = "dark blue french connection jeans for men"
# create sparse and dense vectors
sparse = bm25.encode_queries(query)
dense = model.encode(query).tolist()
# search
result = index.query(
top_k=14,
vector=dense,
sparse_vector=sparse,
include_metadata=True
)
# used returned product ids to get images
imgs = [images[int(r["id"])] for r in result["matches"]]
imgs
Output

Comparing hybrid search results with sparse and dense vector search
In this section, we’ll use the fashion product dataset introduced earlier to compare the outcomes of sparse vector search, dense vector search, and hybrid search using the same search phrase used earlier: “dark blue french connection jeans for men”
By examining the results from the different search methods, we can better understand the strengths and limitations of each, and see how combining them in a hybrid search can yield superior results.
To do so, we scale the vectors, for this, we’ll use a function named hybrid_scale.
Sparse vector search result
First, we will do a pure sparse vector search by setting the alpha value as 0 in the code shown in the Querying section above.

French Connection Men Blue Jeans
French Connection Men Blue Jeans
French Connection Men Blue Jeans
French Connection Men Blue Jeans
French Connection Men Blue Jeans
French Connection Men Blue Jeans
French Connection Women Blue Jeans
French Connection Women Blue Jeans
French Connection Men Navy Blue Jeans
French Connection Men Blue paint Stained Regular Fit Jeans
French Connection Men Black Jeans
French Connection Men Black Jeans
French Connection Men Black Jeans
French Connection Men Black Jeans
Observation: We can observe that the keyword search returned French Connection jeans but failed to rank the men’s French Connection jeans higher than a few of the women’s.
Dense vector search result
Now let’s do a pure semantic image search by setting the alpha value to 1 in the code shown in the Querying section above.

Locomotive Men Radley Blue Jeans
Locomotive Men Race Blue Jeans
Locomotive Men Eero Blue Jeans
Locomotive Men Cam Blue Jeans
Locomotive Men Ian Blue Jeans
French Connection Men Blue Jeans
Locomotive Men Cael Blue Jeans
Locomotive Men Lio Blue Jeans
French Connection Men Blue Jeans
Locomotive Men Rafe Blue Jeans
Locomotive Men Barney Grey Jeans
Spykar Men Actif Fit Low Waist Blue Jeans
Spykar Men Style Essentials Kns 0542 Blue Jeans
Wrangler Men Blue Skanders Jeans
Observation: The semantic image search correctly returned blue jeans for men, but mostly failed to match the exact brand we are looking for — French Connection.
Hybrid vector search result
Now let’s set the alpha value to 0.05 to try a hybrid search that is slightly more dense than a purely sparse search (where alpha = 0).

French Connection Men Blue Jeans
French Connection Men Blue Jeans
French Connection Men Blue Jeans
Locomotive Men Radley Blue Jeans
French Connection Men Navy Blue Jeans
Locomotive Men Race Blue Jeans
Locomotive Men Eero Blue Jeans
Locomotive Men Cam Blue Jeans
French Connection Men Blue Jeans
French Connection Men Blue paint Stained Regular Fit Jeans
Locomotive Men Ian Blue Jeans
Locomotive Men Cael Blue Jeans
French Connection Men Blue Jeans
Locomotive Men Lio Blue Jeans
Observation: By performing a mostly sparse search with some help from our image-based dense vectors, we get a strong number of French Connection jeans, that are for men, and visually are almost all aligned to blue jeans.
The comparison demonstrates that hybrid search integrates the strengths of both sparse and dense vector search while mitigating their individual limitations. By balancing keyword precision with semantic relevance, hybrid search provides more accurate and contextually appropriate results, making it a powerful approach for complex search queries.
Conclusion
Vector search techniques in the Pinecone vector database are critical for AI applications such as Retrieval-Augmented Generation (RAG). This blog delved into various vector search methods, including keyword search, semantic search, and hybrid search, and highlighted their importance in AI. We provided a detailed guide for implementing semantic and hybrid search in Pinecone, and demonstrated how integrating a vector database with Langchain can significantly enhance the performance of large language models (LLMs).
Moreover, our comparison between hybrid search results and those from sparse and dense vector searches showed the superiority of hybrid search, showcasing its effectiveness for handling complex search queries. As we continue to explore new horizons in AI and database technologies, the adoption of these advanced search methodologies will undoubtedly transform the future of data retrieval and analysis.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!
Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Discover Your Dream AI Career at Towards AI Jobs
Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!
Note: Content contains the views of the contributing authors and not Towards AI.