Introduction to RAG: Basics to Mastery.2-Hybrid RAG, Combining Semantic & Keyword Search for Better Retrieval

Author(s):

Originally published on Towards AI.

Part 2 of the mini-series introduction to RAG

Introduction

In Article 1, we built a basic local RAG pipeline using embeddings and a vector database.
While semantic search is powerful, it can sometimes miss results that contain exact keyword matches — especially for rare terms, acronyms, or code snippets.

Hybrid RAG solves this problem by combining:

Semantic search (meaning-based)
Keyword search (exact term matching)

By blending the two, you get the best of both worlds: high recall and high precision.

Why Hybrid RAG Matters

Semantic Search captures meaning across different phrasings. “AI regulation laws” ≈ “rules for artificial intelligence.”
Keyword Search (BM25) ensures literal matches survive. Acronyms, formulas, or code like np.dot don’t always have semantic equivalents.

💡 Think of it this way:

Semantic search is like a friend who understands what you mean, even if you phrase it differently.
Keyword search is like a literal detective, scanning for exact words or symbols.

Together, they cover each other’s blind spots.

Theory How Hybrid RAG Works

Send the query to both a vector database (semantic) and a keyword index (BM25).
Score results separately — similarity score for embeddings, relevance score for BM25.
Combine scores using a weighted formula.
Return the best-ranked chunks as context for the LLM.

This creates a retrieval system that is smarter, yet safer for edge cases.

Setup

We’ll extend the previous pipeline by adding BM25 keyword search with rank_bm25.

Install dependencies:

pip install rank-bm25

(We already have chromadb and sentence-transformers from last time.)

Step-by-Step Code

1. Load & Chunk Documents

(Same code from Article 1.)

# load_docs.py
from pathlib import Path

def load_text_files(folder_path):
 texts = []
 for file in Path(folder_path).glob("*.txt"):
 with open(file, "r", encoding="utf-8") as f:
 texts.append(f.read())
 return texts
docs = load_text_files("./data")
print(f"Loaded {len(docs)} documents.")

# chunking.py
def chunk_text(text, chunk_size=500, overlap=50):
 chunks = []
 start = 0
 while start < len(text):
 end = min(start + chunk_size, len(text))
 chunks.append(text[start:end])
 start += chunk_size - overlap
 return chunks

all_chunks = []
for doc in docs:
 all_chunks.extend(chunk_text(doc))
print(f"Total chunks: {len(all_chunks)}")

2. Build BM25 Keyword Index

from rank_bm25 import BM25Okapi
import nltknltk.download('punkt')
# Tokenize chunks for BM25
tokenized_chunks = [nltk.word_tokenize(chunk.lower()) for chunk in all_chunks]
# Create BM25 index
bm25 = BM25Okapi(tokenized_chunks)

3. Semantic Search Function (from Chroma)

def semantic_search(query, top_k=5):
 query_embedding = embedder.encode([query], convert_to_numpy=True)[0]
 results = collection.query(
 query_embeddings=[query_embedding],
 n_results=top_k
 )
 return results["documents"][0], results["distances"][0]

4. Keyword Search Function (BM25)

def keyword_search(query, top_k=5):
 tokenized_query = nltk.word_tokenize(query.lower())
 scores = bm25.get_scores(tokenized_query)
 ranked_ids = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:top_k]
 return [all_chunks[i] for i in ranked_ids], [scores[i] for i in ranked_ids]

5. Combine Results (Hybrid Search)

def hybrid_search(query, top_k=5, weight_semantic=0.6, weight_keyword=0.4):
 sem_docs, sem_scores = semantic_search(query, top_k)
 key_docs, key_scores = keyword_search(query, top_k)
 combined = {}
 # Normalize semantic distances to similarity (1 - distance)
 sem_scores = [1 - s for s in sem_scores]
 for doc, score in zip(sem_docs, sem_scores):
 combined[doc] = combined.get(doc, 0) + score * weight_semantic
 for doc, score in zip(key_docs, key_scores):
 combined[doc] = combined.get(doc, 0) + score * weight_keyword
 ranked_docs = sorted(combined.items(), key=lambda x: x[1], reverse=True)
 return [doc for doc, _ in ranked_docs[:top_k]]

6. Query & Generate Answer

query = "Explain the role of solar panels in renewable energy."
top_chunks = hybrid_search(query, top_k=3)
context = "\n".join(top_chunks)
prompt = f"Answer the question using only the following context:\n{context}\n\nQuestion: {query}\nAnswer:"
import subprocess
ollama_cmd = ["ollama", "run", "mistral", prompt]
response = subprocess.run(ollama_cmd, capture_output=True, text=True)
print("LLM Response:\n", response.stdout)

Expected Output

The system now returns results that match both meaning and exact words.

Queries with acronyms (NLP, SQL) no longer fail.
Queries with code snippets (like np.dot) now surface exact matches.
Queries with rare terms get the balance of semantic context and keyword fidelity.

This is the essence of Hybrid RAG: no more missed answers due to unusual phrasing.

Practical Applications

Hybrid retrieval is especially useful in:

Developer Assistants → matching code examples (for loop, lambda) precisely.
Healthcare RAG → acronyms (e.g., COPD, HbA1c) must not be “interpreted away.”
Legal/Compliance → keyword precision is critical (law references, case IDs).
Ecommerce Search → Enhance keyword search results with context-aware retrieval and ranking.

Next Steps

In Article 3, we’ll make our RAG Agentic — giving it the ability to decide when and how to retrieve information and even call other tools when needed.

Please visit my Github repo for the full functional code: https://github.com/Taha-azizi/RAG

All images were generated by the author using AI tools.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Introduction to RAG: Basics to Mastery.2-Hybrid RAG, Combining Semantic & Keyword Search for Better Retrieval

Author(s):

Part 2 of the mini-series introduction to RAG

Introduction

Why Hybrid RAG Matters

Theory How Hybrid RAG Works

Setup

Step-by-Step Code

1. Load & Chunk Documents

2. Build BM25 Keyword Index

3. Semantic Search Function (from Chroma)

4. Keyword Search Function (BM25)

5. Combine Results (Hybrid Search)

6. Query & Generate Answer

Expected Output

Practical Applications

Next Steps

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

No Code, No Limits: The Best Open-Source AI UIs in 2025

LLMs Don’t Need Search Engines: They Can Search Their Own Brains

This Plug-and-Play AI Memory Works With Any Model

From Prompts to RAG to RAGAs: Evaluating Retrieval-Augmented Generation Systems the Right Way

“BIOREASON” Makes DNA Analysis Simple Using AI

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Introduction to RAG: Basics to Mastery.2-Hybrid RAG, Combining Semantic & Keyword Search for Better Retrieval

Author(s):

Part 2 of the mini-series introduction to RAG

Introduction

Why Hybrid RAG Matters

Theory How Hybrid RAG Works

Setup

Step-by-Step Code

1. Load & Chunk Documents

2. Build BM25 Keyword Index

3. Semantic Search Function (from Chroma)

4. Keyword Search Function (BM25)

5. Combine Results (Hybrid Search)

6. Query & Generate Answer

Expected Output

Practical Applications

Next Steps

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement