Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Our 15 AI experts built the most comprehensive, practical, 90+ lesson courses to master AI Engineering - we have pathways for any experience at Towards AI Academy. Cohorts still open - use COHORT10 for 10% off.

Publication

Beyond Basic RAG: A Practical Guide to Advanced Indexing Techniques
Artificial Intelligence   Data Science   Latest   Machine Learning

Beyond Basic RAG: A Practical Guide to Advanced Indexing Techniques

Last Updated on October 7, 2025 by Editorial Team

Author(s): Saif Ali Kheraj

Originally published on Towards AI.

Beyond Basic RAG: A Practical Guide to Advanced Indexing Techniques
https://en.wikipedia.org/wiki/Retrieval-augmented_generation#/media/File:RAG_diagram.svg

Retrieval Augmented Generation (RAG) has become the go to approach for building AI systems that can access and reason over large document collections. But here is the reality most developers face: basic RAG often falls short when dealing with complex queries or large documents.

You have probably experienced this frustration. You ask your RAG system “How do machine learning algorithms handle overfitting in production environments?” and get back a fragment about regularization techniques, technically correct, but missing the broader context about deployment considerations, monitoring, and real world trade offs.

The problem is not with RAG itself. It is with naive indexing approaches that treat all content equally and assume small chunks contain sufficient context. Let us explore four indexing strategies that can dramatically improve your RAG system’s performance.

The Naive RAG Baseline: Why It Fails

Most RAG implementations start with this simple approach:

  1. Split documents into 200–500 word chunks
  2. Generate embeddings for each chunk
  3. Store chunks in a vector database
  4. For queries, find the most similar chunks
  5. Pass chunks to an LLM for response generation

Why It Often Fails:

  • Context Fragmentation: Important information gets split across multiple chunks
  • Surface-Level Matching: Semantic search finds topically related content, not necessarily the best content
  • Limited Context Window: LLMs only see small fragments, missing the bigger picture

When Naive RAG Works:

  • Simple factual queries (“What is the capital of France?”)
  • Small documents that don’t need chunking
  • High volume, cost-sensitive applications
  • Quick prototyping and proof of concepts

1. Self Querying Retrieval: Adding Intelligence to Search

Self Querying Retrieval (SQR) enhances search by combining semantic similarity with metadata filtering, letting users ask natural questions like “Find malaria reports from Africa after 2022” and getting precise results.

Step 1: Define Your Document Schema

You need documents with content and metadata fields:

{
"content": "Report on malaria control in Kenya, 2023",
"metadata": {"year": 2023, "region": "Africa", "topic": "malaria"}
}

Step 2. Create a Vector Store

Use FAISS, Pinecone, Weaviate, etc. Example with FAISS:

from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings


embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(
texts=[d["content"] for d in docs],
embedding=embeddings,
metadatas=[d["metadata"] for d in docs]
)

Step 3. Define the Metadata Schema for Querying

Tell the retriever what structured fields exist:

from langchain.chains.query_constructor.schema import AttributeInfo

metadata_field_info = [
AttributeInfo(name="year", type="int", description="year of the report"),
AttributeInfo(name="region", type="string", description="geographic region"),
AttributeInfo(name="topic", type="string", description="health or climate topic")
]

This helps the LLM parse filters like year > 2022 or region = Africa

Step 4. Initialize the Self-Query Retriever

Wrap your vector store with an LLM-powered parser:

from langchain.chains import SelfQueryRetriever
from langchain_openai import ChatOpenAI


llm = ChatOpenAI(model="gpt-4o-mini")


retriever = SelfQueryRetriever.from_llm(
llm,
vectorstore,
document_content_description="A collection of health and climate reports",
metadata_field_info=metadata_field_info
)

Now queries can include semantic terms + filters.

Step 5. Ask Natural Language Questions

query = "Find malaria reports from Africa after 2022"
docs = retriever.get_relevant_documents(query)
for d in docs:
print(d.page_content, d.metadata)

Instead of just keyword matching, the retriever:

  1. Extracts semantic meaning → “malaria reports”.
  2. Extracts filters → { "region": "Africa", "year": { ">": 2022 } }.
  3. Combines both to return precise documents.

Advantages

  • Natural language queries work intuitively
  • Combines semantic search with precise filtering
  • Handles complex multi-criteria searches
  • Reduces irrelevant results significantly

Disadvantages:

  • Cost: 50–500x more expensive than naive RAG due to LLM query parsing
  • Requires rich metadata for effective filtering
  • Additional complexity in setup and maintenance
  • Dependent on LLM quality for query interpretation

Best Use Cases:

  • Research platforms with expert users
  • Document collections with rich metadata
  • Low-to-medium query volumes (< 10K/day)
  • Applications where query precision is critical

2. Parent Document Retrieval: Context Without Compromise

How to keep precision and context in vector search by storing chunks and full documents separately.

Imagine you are searching a 500‑page medical guideline for “beta‑blockers in heart failure with diabetes.”

  • If you only embed small chunks (≈400 words), you might find the right paragraph but miss the dosing table or contraindications a few pages earlier.
  • If you embed the entire guideline (all 500 pages), the embedding becomes fuzzy and retrieval quality drops.

Parent Document Retrieval (PDR) solves this by storing two things separately:

  1. Small chunks in a vector store for precise semantic search.
  2. Full parent documents in a doc store for complete context.

At query time, the retriever finds the most relevant chunks, then maps them back to their parent document, returning the full document so you get both accuracy and completeness.

Step 1: The Strategy

  1. Split into small chunks (e.g: 400 words) → accurate embedding search.
  2. Keep complete documents (e.g: 5,000+ words) → context preserved.
  3. Maintain a mapping between each chunk and its parent document.
  4. Search with chunks, return parents → precise search + rich answers.

Think of it like: search with a microscope, deliver the whole book.

Step 2: Implementation (LangChain-style)

# Step A: Split documents into small chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter
child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)
# Step B: Initialize Parent Document Retriever
from langchain.retrievers import ParentDocumentRetriever
retriever = ParentDocumentRetriever(
vectorstore=vectorstore, # stores small chunks for semantic search
docstore=docstore, # stores full parent documents for context
child_splitter=child_splitter,
# parent_splitter=None → returns full parent docs as-is
)
# Step C: Ask natural questions
results = retriever.invoke("How do neural networks prevent overfitting?")
# returns full parent documents that contain the most relevant chunks

Notes:

  • The vector store (e.g: FAISS, Pinecone) stores chunk embeddings.
  • The doc store (e.g: in-memory, Redis, Mongo, S3) stores full parent docs.
  • The retriever keeps an ID mapping from each chunk → parent doc.

Advantages

  • Maintains search precision with small chunks.
  • Returns complete context from full documents.
  • Excellent for complex analytical queries.
  • Preserves document structure and cross-references.

Disadvantages

  • 2–3x storage (you keep chunks and full parents).
  • 5–100x higher LLM costs if you pass entire parents to the model.
  • May include irrelevant sections inside a relevant parent.
  • Not ideal for synthesizing across many documents simultaneously.

Best Use Cases

  • Academic research : long papers where surrounding context matters.
  • Technical documentation : interlinked concepts, code, and footnotes.
  • Educational content: full explanations, not snippets.
  • Legal / medical documents: precision requires full context.

3. Multi-Vector Retrieval: Multiple Representations, Better Matches

How to handle diverse query styles by creating multiple embeddings per document while preserving the original source.

Imagine you are searching through a 500 page medical guideline and need to handle completely different query styles:

  • An executive asks: “Give me a high-level overview of treatment protocols”
  • A clinician asks: “What are the specific contraindications for beta-blockers in diabetic patients?”
  • A researcher asks: “Show me the clinical trial data and statistical significance”

Traditional chunking fails here because:

  • Small chunks miss high-level concepts executives need
  • Large chunks dilute the specific details clinicians require
  • Single embeddings can’t capture both broad themes AND granular facts

Multi-Vector Retrieval (MVR) solves this by creating multiple embeddings per document each optimized for different query types while keeping the original document intact for complete context.

Step 1: The Strategy

  1. Create multiple representations of each document (summary, details, key concepts, examples)
  2. Embed each representation separately → diverse semantic matching
  3. Store all embeddings in the vector store with links to the same parent
  4. Keep original documents in the doc store → context preserved
  5. Search across all representations, return the source → flexible queries + complete answers

Think of it like: multiple doors into the same room, executives use the summary door, clinicians use the technical door, but everyone gets the full document.

Step 2: Implementation (LangChain-style)

# Step A: Create multiple document representations
from langchain.retrievers import MultiVectorRetriever
from langchain.storage import InMemoryStore

# Generate different views of the same document
def create_representations(doc):
return {
"summary": create_summary(doc), # for executives
"technical": extract_technical(doc), # for specialists
"key_concepts": extract_concepts(doc), # for researchers
"examples": extract_examples(doc) # for educators
}
# Step B: Initialize Multi-Vector Retriever
retriever = MultiVectorRetriever(
vectorstore=vectorstore, # stores multiple embeddings per doc
docstore=InMemoryStore(), # stores original full documents
id_key="doc_id" # maps embeddings → parent docs
)
# Step C: Add documents with multiple representations
for doc in documents:
doc_id = str(uuid.uuid4())
representations = create_representations(doc)

# Store original document
retriever.docstore.mset([(doc_id, doc)])

# Create and store multiple embeddings
for rep_type, content in representations.items():
retriever.vectorstore.add_texts(
[content],
ids=[f"{doc_id}_{rep_type}"]
)
# Step D: Query with different styles
executive_query = "high-level treatment overview"
clinical_query = "specific contraindications for diabetics"
research_query = "clinical trial statistical significance"

# All return the same full source document, but match via different representations
results = retriever.invoke(executive_query)

Advantages

  • Handles diverse query styles : executives, specialists, researchers all find relevant matches
  • Preserves original context : complete documents maintain structure and cross-references
  • Flexible matching : broad concepts AND specific details both work
  • Same source, multiple entry points : reduces document duplication

Disadvantages

  • 3–5x storage overhead (multiple embeddings + original docs)
  • Complex setup : requires thoughtful representation generation
  • Potential embedding conflicts: different representations might compete
  • Higher processing costs : generating multiple representations per document

Best Use Cases

  • Mixed-audience knowledge bases : technical docs for both managers and engineers
  • Educational content : same material for different learning styles
  • Multi-stakeholder documentation : legal, technical, and business audiences
  • Research databases : papers with abstracts, methods, and conclusions as separate searchables

4. Advanced Chunking Strategies: Smarter Text Segmentation

How to respect document structure and semantic boundaries instead of blindly cutting text at arbitrary character limits.

You are building a help system for your company’s API documentation. Using basic chunking, you get this mess:

Chunk 1:

"How to authenticate: Send your API key in the header. Here's the co"

Chunk 2:

"de example: curl -H 'Authorization: Bearer your-key-here' https://api"

The problem? The code example got cut in half. Users searching for “authentication code example” might find Chunk 2 with orphaned code, but miss the explanation in Chunk 1.

Advanced chunking keeps related things together:

Better Chunk:

"How to authenticate: Send your API key in the header. Here's the code example:
curl -H 'Authorization: Bearer your-key-here' https://api.example.com/data"

Now users get the complete picture in one piece.

Step 1: The Strategy

  1. Keep related content together : code stays with its explanation
  2. Split at natural breaks : paragraphs, headings, topic changes
  3. Try big separators first : prefer splitting at empty lines over mid-sentence
  4. Adapt to content type : handle code differently than plain text
  5. Allow flexible sizes : some chunks can be longer if they need to be

Step 2: Implementation Strategies

a. Recursive Structure-Aware Splitting

This method breaks documents into smaller pieces by following a smart order. It tries to split at natural breaking points first (like paragraph breaks), then moves to smaller breaks (like sentences), and only breaks mid-sentence if absolutely necessary.

How it works:

  • First, it looks for paragraph breaks (double line breaks)
  • If chunks are still too big, it splits at single line breaks
  • Then it tries splitting at sentence endings (periods)
  • As a last resort, it splits at spaces between words
  • It also keeps some overlap between chunks so information flows smoothly

b. Semantic Chunking

What it does: This method is smarter — it actually understands the meaning of the text. It splits documents based on when the topic changes, not just on size.

How it works:

  • The system reads and understands what each sentence means
  • It compares sentences to see how related they are
  • When it notices a major topic shift (like moving from “pricing” to “features”), it creates a split
  • Chunks stay together as long as they’re talking about the same subject

Content-Aware Splitting

What it does: This method recognizes different types of content and splits them appropriately. It knows that code, markdown, and HTML should be handled differently.

How it works:

  • For Markdown documents: It respects headers, code blocks, and lists
  • For Code files: It keeps functions, classes, and imports together
  • For HTML pages: It splits at logical points like headers and div sections

c. Hybrid Approach

This combines multiple methods to get the best results. It uses the right tool for each situation.

  • Step 1: Choose the right splitting method based on content type (code gets code splitter, markdown gets markdown splitter, etc.)
  • Step 2: If any chunks are still too big, use semantic splitting to break them down further based on topic changes

Advantages

  • Keeps related info together: Explanations stay with their examples
  • Natural reading experience: Each chunk makes sense on its own
  • Better search results: Users find complete, useful information
  • Works with different content types: Adapts to code, text, tables
  • Fewer broken references: Links and citations stay intact

Disadvantages

  • Inconsistent chunk sizes: Some chunks are much bigger than others
  • More processing time: Analyzing structure takes extra work
  • Harder to predict costs: Variable sizes make it difficult to estimate token usage
  • Requires tuning: Different content needs different settings
  • Can create huge chunks: Sometimes everything “belongs together” and chunks get too large

Best Use Cases

  • Documentation sites: API docs, user guides, tutorials
  • Educational content: Courses where concepts build on each other
  • Technical manuals: Step-by-step procedures with code or diagrams
  • Knowledge bases: Mixed content types (text, code, tables, images)
  • Research papers: Where arguments flow across multiple paragraphs

Summary

Figure by Author

Moving Forward

RAG is evolving rapidly, and these techniques represent current best practices rather than final solutions. The most successful implementations often combine multiple approaches , using advanced chunking with parent document retrieval, or self-querying with multi-vector representations.

Your specific domain, user needs, and resource constraints will ultimately determine the right balance of sophistication and practicality. Start simple, measure performance carefully, and add complexity only when the benefits clearly justify the costs.

The future of RAG lies not in choosing a single “best” approach, but in creating intelligent systems that can dynamically select the right retrieval strategy for each query. Until then, understanding these fundamental techniques will help you build RAG systems that actually meet your users’ needs.

References

[1] Retrieval Augmented Generation (RAG) for LLMs Prompt Engineering Guide
https://www.promptingguide.ai/research/rag

[2] Build a Retrieval Augmented Generation (RAG) App LangChain
https://python.langchain.com/docs/tutorials/rag/

[3] Chunking Strategies to Improve Your RAG Performance Weaviate
https://weaviate.io/blog/chunking-strategies-for-rag

[4] Parent Document Retrieval: Useful Technique in RAG DZone
https://dzone.com/articles/parent-document-retrieval-useful-technique-in-rag

[5] Building an Advanced RAG System With Self-Querying Retrieval MongoDB
https://www.mongodb.com/developer/products/atlas/advanced-rag-self-querying-retrieval/

[6] Multi-Vector Retriever for RAG on tables, text, and images LangChain
https://blog.langchain.com/semi-structured-multi-modal-rag/

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Discover Your Dream AI Career at Towards AI Jobs

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!

Note: Content contains the views of the contributing authors and not Towards AI.