How Semantic Search is Transforming the Way We Find Information

Last Updated on January 27, 2025 by Editorial Team

Author(s): Shivam Dattatray Shinde

Originally published on Towards AI.

Agenda

How information retrieval used to work in the past
Introduction of sematic search to the retrieval systems
Dense Retrieval
Reranking
RAGs
Conclusion

How information retrieval used to work in the past

Before the advent of large language models, information retrieval on the internet relied on methods such as keyword matching, boolean retrieval, etc. However, this approach had several limitations:

Information retrieval depended on the specific words used in the query and how it was structured, rather than on an understanding of the user’s intent.
It struggled to account for synonyms, polysemy (words with multiple meanings), and the nuances of grammar.
Users needed to carefully and deliberately craft their queries to ensure that the retrieval results met their expectations.

Introduction of sematic search in the retrieval systems

In the early stages of large language models (LLMs), users encountered a problem known as ‘hallucinations.’ LLMs often provided answers that were either incorrect or outdated, doing so with high confidence. To address this issue, solutions like retrieval-augmented generation (RAG) models were introduced.

LLMs integrated into search functionality can be broadly categorized into three main types.

Dense Retrieval
Reranking
Retrieval Augmented Generation(RAG) Models

Dense Retrieval

Dense retrieval models operate based on embeddings. First, the user’s search query is transformed into an embedding. This embedding is then compared to the pre-calculated embeddings of the text database to measure similarity. The results with the highest similarity are returned. This is the fundamental process behind dense retrieval.

The following diagram provides a clearer understanding of this concept.

Source: **Hands-On Large Language Models By Jay Alammar, Maarten Grootendorst**

In the diagram above, only a single result is shown. However, in most cases, multiple results are returned. To obtain multiple results, we identify the nearest neighbors of the user’s search query embeddings within the vector database’s embedding space. This process is further illustrated in the following diagram.

In this approach, there’s a possibility that results with a low similarity score could still be returned. To address this, we set a similarity threshold. Only results with a similarity score above this threshold will be returned, which means there’s a chance that no results may be retrieved at all.

Caveats of dense retrieval

Sometimes, users may want to search for a specific phrase, and relying solely on dense retrieval may not provide the best results. In such cases, hybrid models that combine both keyword search and dense retrieval offer a better solution.
Dense retrieval models often struggle to perform well outside the domains they were trained on. For example, if a model is trained on Wikipedia data and used for searching legal texts, the results are likely to be less accurate.

Chunking long text

Indexing can be achieved by comparing the similarity between the embedding of a user’s query and the embedding of each sentence, then returning the sentence with the highest similarity score as the result. However, this approach may fall short when the answer to a user’s query spans multiple lines. In such cases, a different technique for chunking the text is necessary. Additionally, we must consider a limitation of transformer architectures: their restricted context size, which prevents us from inputting very large chunks of text.

To address these challenges, the following methods can be used for text chunking before applying the embedding technique:

One Vector Per Document
This approach involves creating embeddings using only a representative portion of the document, such as the title, introduction, or abstract. While this method is suitable for demos, it has limited utility as it excludes a significant amount of information, making it unsearchable. The document is divided into smaller chunks, embeddings are generated for these chunks, and the embeddings are aggregated to form a highly compressed vector representation of the document. However, this compression leads to a substantial loss of information.
Multiple Vectors Per Document
In this approach, the document is divided into multiple smaller chunks, each of which is embedded individually. Instead of using a single embedding for the entire document, these chunk-level embeddings collectively represent the document. Overlapping sentences can also be included in the chunks to improve representation. This method retains more detail and allows for a more effective search by leveraging finer-grained embeddings.

Reranking

In this approach, the search results are reordered after they are initially retrieved. The user query can be matched with database documents using various methods, such as keyword matching, dense retrieval, or a hybrid of both. This method enhances the performance of information retrieval systems.

This approach improves the performance of the information retrieval.

In this approach, the model evaluates the relevance of the user’s search query against each indexed document and assigns rankings based on the resulting relevance scores.

Retrieval Augmented Generation (RAGs)

When large language models (LLMs) were first introduced, people expected them to answer any question they posed. However, LLMs often struggled with highly specific or niche questions, as well as questions about recent factual information. This limitation stemmed from their reliance on static training data, which left them unaware of the latest developments or updates in textual information. This phenomenon is commonly referred to as “hallucination.”

This is where retrieval-augmented generation (RAG) comes into play.

To illustrate, consider a courtroom scenario. A judge typically makes decisions based on their knowledge of the law and general common sense. However, in specialized cases, such as health-related lawsuits, the judge may consult experts like doctors or surgeons for assistance.

RAG operates on a similar principle, enhancing LLMs by integrating external, domain-specific knowledge to improve their accuracy and reliability.

The diagram above illustrates that the prompt is not directly fed into the LLM. Instead, it is used to retrieve relevant information from the internet about the topic. This retrieved information is indexed using any suitable method and then supplied to the LLM for generating answers. By leveraging internet-indexed data, LLMs can respond to factual or highly specialized questions effectively. This approach, where information is retrieved from the internet prior to answer generation, is known as “Retrieval Augmented Generation.”

Conclusion

The article explain how the search functionality used to work in the past and how the introduction of sematic meaning into the search functionality improved its performance. The article mainly focuses on three LLMs namely dense retrieval, reranking and RAGs that are most widely used.

Outro

Thank you so much for reading. If you liked this article, don’t forget to press that clap icon. Follow me on Medium and LinkedIn for more such articles.

Are you struggling to choose what to read next? Don’t worry, I have got you covered.

References

[1]https://learning.oreilly.com/library/view/hands-on-large-language/9781098150952/
[2]https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

How Semantic Search is Transforming the Way We Find Information

Author(s): Shivam Dattatray Shinde

Classifying the Unstructured: A Guide to Text Classification with Representation and Generative…

This article will delve into the various methodologies to perform text classification using transformer-based models…

From Words to Vectors: Exploring Text Embeddings

This article will guide you through the various techniques for transforming text into formats that machines can…

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Arbitration for AI: A New Frontier in Governing Uncensored Models

Fine-Tuning vs Distillation vs Transfer Learning: What’s The Difference?

#63: Full of Frameworks: APDTFlow, NSGM, MLFlow, and more!

Vector Databases 101: A Beginner’s Guide to Vector Search and Indexing

AI Agent Developer: A Journey Through Code, Creativity, and Curiosity

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

How Semantic Search is Transforming the Way We Find Information

Author(s): Shivam Dattatray Shinde

Classifying the Unstructured: A Guide to Text Classification with Representation and Generative…

This article will delve into the various methodologies to perform text classification using transformer-based models…

From Words to Vectors: Exploring Text Embeddings

This article will guide you through the various techniques for transforming text into formats that machines can…

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement