Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

How Semantic Search is Transforming the Way We Find Information
Latest   Machine Learning

How Semantic Search is Transforming the Way We Find Information

Last Updated on January 27, 2025 by Editorial Team

Author(s): Shivam Dattatray Shinde

Originally published on Towards AI.

Agenda

  1. How information retrieval used to work in the past
  2. Introduction of sematic search to the retrieval systems
  3. Dense Retrieval
  4. Reranking
  5. RAGs
  6. Conclusion

How information retrieval used to work in the past

Before the advent of large language models, information retrieval on the internet relied on methods such as keyword matching, boolean retrieval, etc. However, this approach had several limitations:

  1. Information retrieval depended on the specific words used in the query and how it was structured, rather than on an understanding of the user’s intent.
  2. It struggled to account for synonyms, polysemy (words with multiple meanings), and the nuances of grammar.
  3. Users needed to carefully and deliberately craft their queries to ensure that the retrieval results met their expectations.

Introduction of sematic search in the retrieval systems

In the early stages of large language models (LLMs), users encountered a problem known as β€˜hallucinations.’ LLMs often provided answers that were either incorrect or outdated, doing so with high confidence. To address this issue, solutions like retrieval-augmented generation (RAG) models were introduced.

LLMs integrated into search functionality can be broadly categorized into three main types.

  1. Dense Retrieval
  2. Reranking
  3. Retrieval Augmented Generation(RAG) Models

Dense Retrieval

Dense retrieval models operate based on embeddings. First, the user’s search query is transformed into an embedding. This embedding is then compared to the pre-calculated embeddings of the text database to measure similarity. The results with the highest similarity are returned. This is the fundamental process behind dense retrieval.

The following diagram provides a clearer understanding of this concept.

Source: Hands-On Large Language Models By Jay Alammar, Maarten Grootendorst

In the diagram above, only a single result is shown. However, in most cases, multiple results are returned. To obtain multiple results, we identify the nearest neighbors of the user’s search query embeddings within the vector database’s embedding space. This process is further illustrated in the following diagram.

Source: Hands-On Large Language Models By Jay Alammar, Maarten Grootendorst

In this approach, there’s a possibility that results with a low similarity score could still be returned. To address this, we set a similarity threshold. Only results with a similarity score above this threshold will be returned, which means there’s a chance that no results may be retrieved at all.

Caveats of dense retrieval

  1. Sometimes, users may want to search for a specific phrase, and relying solely on dense retrieval may not provide the best results. In such cases, hybrid models that combine both keyword search and dense retrieval offer a better solution.
  2. Dense retrieval models often struggle to perform well outside the domains they were trained on. For example, if a model is trained on Wikipedia data and used for searching legal texts, the results are likely to be less accurate.

Chunking long text

Indexing can be achieved by comparing the similarity between the embedding of a user’s query and the embedding of each sentence, then returning the sentence with the highest similarity score as the result. However, this approach may fall short when the answer to a user’s query spans multiple lines. In such cases, a different technique for chunking the text is necessary. Additionally, we must consider a limitation of transformer architectures: their restricted context size, which prevents us from inputting very large chunks of text.

To address these challenges, the following methods can be used for text chunking before applying the embedding technique:

  1. One Vector Per Document
    This approach involves creating embeddings using only a representative portion of the document, such as the title, introduction, or abstract. While this method is suitable for demos, it has limited utility as it excludes a significant amount of information, making it unsearchable. The document is divided into smaller chunks, embeddings are generated for these chunks, and the embeddings are aggregated to form a highly compressed vector representation of the document. However, this compression leads to a substantial loss of information.
  2. Multiple Vectors Per Document
    In this approach, the document is divided into multiple smaller chunks, each of which is embedded individually. Instead of using a single embedding for the entire document, these chunk-level embeddings collectively represent the document. Overlapping sentences can also be included in the chunks to improve representation. This method retains more detail and allows for a more effective search by leveraging finer-grained embeddings.
Source: Hands-On Large Language Models By Jay Alammar, Maarten Grootendorst

Reranking

In this approach, the search results are reordered after they are initially retrieved. The user query can be matched with database documents using various methods, such as keyword matching, dense retrieval, or a hybrid of both. This method enhances the performance of information retrieval systems.

This approach improves the performance of the information retrieval.

Source: Hands-On Large Language Models By Jay Alammar, Maarten Grootendorst

In this approach, the model evaluates the relevance of the user’s search query against each indexed document and assigns rankings based on the resulting relevance scores.

Source: Hands-On Large Language Models By Jay Alammar, Maarten Grootendorst

Retrieval Augmented Generation (RAGs)

When large language models (LLMs) were first introduced, people expected them to answer any question they posed. However, LLMs often struggled with highly specific or niche questions, as well as questions about recent factual information. This limitation stemmed from their reliance on static training data, which left them unaware of the latest developments or updates in textual information. This phenomenon is commonly referred to as β€œhallucination.”

This is where retrieval-augmented generation (RAG) comes into play.

To illustrate, consider a courtroom scenario. A judge typically makes decisions based on their knowledge of the law and general common sense. However, in specialized cases, such as health-related lawsuits, the judge may consult experts like doctors or surgeons for assistance.

RAG operates on a similar principle, enhancing LLMs by integrating external, domain-specific knowledge to improve their accuracy and reliability.

Source: Hands-On Large Language Models By Jay Alammar, Maarten Grootendorst

The diagram above illustrates that the prompt is not directly fed into the LLM. Instead, it is used to retrieve relevant information from the internet about the topic. This retrieved information is indexed using any suitable method and then supplied to the LLM for generating answers. By leveraging internet-indexed data, LLMs can respond to factual or highly specialized questions effectively. This approach, where information is retrieved from the internet prior to answer generation, is known as β€œRetrieval Augmented Generation.”

Conclusion

The article explain how the search functionality used to work in the past and how the introduction of sematic meaning into the search functionality improved its performance. The article mainly focuses on three LLMs namely dense retrieval, reranking and RAGs that are most widely used.

Outro

Thank you so much for reading. If you liked this article, don’t forget to press that clap icon. Follow me on Medium and LinkedIn for more such articles.

Are you struggling to choose what to read next? Don’t worry, I have got you covered.

Classifying the Unstructured: A Guide to Text Classification with Representation and Generative…

This article will delve into the various methodologies to perform text classification using transformer-based models…

pub.towardsai.net

and one more…

From Words to Vectors: Exploring Text Embeddings

This article will guide you through the various techniques for transforming text into formats that machines can…

pub.towardsai.net

References

[1]https://learning.oreilly.com/library/view/hands-on-large-language/9781098150952/
[2]https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓