RAG Text Chunking Strategies: Optimize LLM Knowledge Access
Author(s): Abinaya Subramaniam
Originally published on Towards AI.
If retrieval is the search engine of your RAG system, chunking is the foundation the search engine stands on. Even the strongest LLM fails when the chunks are too long, too short, noisy, or cut at the wrong place. That is why practitioners often say:
“Chunking determines 70% of RAG quality.”
Good chunking helps the retriever find information that is complete, contextual, and relevant while bad chunking creates fragmented, out of context passages that force the LLM to hallucinate.

If you’re just joining the series, check out my previous post: Introduction to RAG: Why Modern AI Needs Retrieval — it explains the basics of Retrieval-Augmented Generation.
What Is Chunking?
The first step in RAG is document collection and ingestion, where all source materials documents, articles, or knowledge base entries are gathered. Before retrieval, these documents undergo text chunking, which splits them into smaller, meaningful segments called chunks.
Each chunk is designed to be coherent and self contained, allowing the retriever to efficiently locate, rank, and use the most relevant pieces of information when responding to a query.

Chunking is the process of dividing large text into smaller, meaningful segments before generating embeddings. These segments called chunks are what the retriever actually searches through when answering a query.
Imagine asking someone about a chapter in a textbook but you ripped the chapter into random, uneven pieces beforehand. If the pieces don’t align with the logical structure of the content, the answer will be confused or incomplete. RAG systems behave the same way.
A well chunked document captures ideas cleanly, maintains context, and allows the LLM to reason meaningfully. Poor chunking fractures meaning and causes retrieval noise. Everything else vector stores, embeddings, rerankers comes after this foundational step.
Why Chunking Matters More Than we Think
Chunking is not simply splitting text into pieces. It controls how your system retrieves information and how much context the LLM receives.
If chunks are too large, they may contain irrelevant or tangential information, which can confuse the model and dilute the focus on the query. The LLM may struggle to reason effectively, potentially producing answers that are vague, contradictory, or partially incorrect.
Conversely, if chunks are too small, they may lack sufficient context for the model to understand the full meaning, leaving it starved of information and prone to incomplete or fragmented responses.
Good chunking finds the balance self contained ideas that are neither too short nor too long, aligned with how humans naturally organize information.
Let’s see some chunking strategies now.
Fixed-Size Chunking
Fixed size chunking is the simplest form. The text is split by a predefined number of characters or tokens like 500 tokens per chunk regardless of sentence or paragraph boundaries.
It is predictable, fast to generate, and effective for very large, messy, or mixed datasets. But it has an obvious weakness. Meaning often gets cut in half. For example, a sentence may begin in one chunk and end in another, reducing the embedding’s semantic strength.

A small overlap between chunks is typically used to preserve continuity:
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=50
)
chunks = splitter.split_text(long_text)
Understanding Chunk Overlap
When dividing text into chunks, a small overlap between consecutive chunks is often added to preserve context and continuity. Overlap means that the last few sentences of one chunk are repeated at the start of the next chunk.

This ensures that important information spanning the boundary of two chunks isn’t lost. Without overlap, the retriever might return only part of an idea, causing the LLM to miss key context and produce incomplete or misleading answers. A typical overlap ranges from 10% to 20% of the chunk length, balancing redundancy with efficiency.
Fixed-size chunking is a practical choice for logs, emails, code repositories, and large corpora where structure is inconsistent.
Sentence-Based Chunking
Sentence-based chunking is a method where text is divided into chunks based on complete sentences rather than arbitrary lengths. This approach ensures that each chunk contains coherent ideas, preserving grammatical and semantic integrity.

It is particularly useful for maintaining clarity and context, as each chunk represents a meaningful unit of thought. By grouping sentences logically, the retriever can return more precise and understandable information to the LLM, reducing the risk of fragmented or confusing responses. Sentence-based chunking is often combined with small overlaps to further maintain continuity across chunks.
Paragraph-Based Chunking
Paragraph-based chunking divides text into chunks based on complete paragraphs rather than individual sentences or fixed token counts. This method preserves the natural structure and flow of the content, making it easier for the retriever to capture coherent ideas and context.
Each chunk typically represents a distinct topic or subtopic, which helps the LLM generate more accurate and meaningful responses. Paragraph-based chunking is particularly effective for long-form documents, research papers, or articles where maintaining the logical flow of information is important. Like sentence-based chunking, it can also incorporate small overlaps to ensure continuity across adjacent chunks.
Semantic Chunking
Semantic chunking looks for meaning instead of length. Instead of splitting text arbitrarily, it identifies natural breaks topic changes, context shifts, or section boundaries using embeddings or similarity scores.
This produces coherent chunks with stronger semantic clarity. Because the chunk boundaries follow meaning, retrieval quality improves significantly, especially in structured content like knowledge bases, documentation, or articles. The trade-off is computation, semantic chunking is heavier and produces inconsistent chunk lengths.
from langchain_experimental.text_splitter import SemanticChunker
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
chunker = SemanticChunker(model, breakpoint_threshold=0.4)
chunks = chunker.split_text(long_text)
For high quality documents where topic flow matters, semantic chunking is often the most accurate choice.
Recursive Splitting
Recursive splitting sits between fixed size and semantic approaches. It respects structure first, and only breaks apart text when necessary.
A typical strategy is to try splitting by headings, and if a section is still too long, then split by paragraphs, then sentences, and only finally by characters. This creates chunks that are both meaningful and size-controlled.
recursive_splitter = RecursiveCharacterTextSplitter(
separators=["\n## ", "\n### ", "\n", ". ", ""],
chunk_size=600,
chunk_overlap=80
)
chunks = recursive_splitter.split_text(long_doc)
This method excels in structured content such as developer documentation, technical manuals, reports, and scholarly material where the hierarchy matters.
Sliding Window Chunking
Some content spreads meaning across multiple sentences, like legal contracts, scientific papers, or long explanations. For such documents, a sliding window approach ensures continuity.
Instead of making distinct chunks, the method creates overlapping windows for example, a 400-token window sliding 200 tokens at a time. Each chunk shares context with the next, preventing meaning from being lost at boundaries.
This method maintains context beautifully but increases the number of chunks, which affects cost and performance.
Sliding windows are especially valuable in legal RAG, finance, medical research, and compliance systems.
Hierarchical Chunking
Hierarchical chunking builds a multi-level structure, small chunks for fine-grained retrieval, medium ones for balanced reasoning, and large ones to maintain global context.
At retrieval time, the system may first fetch a small chunk for precision but then pair it with a related larger chunk to restore full context. This reduces hallucination and improves reasoning depth.
This technique powers enterprise-level RAG systems and multi-granularity frameworks like those in LlamaIndex.
Real World Chunking Mistakes
Most RAG projects fail due to subtle chunking issues. Oversized chunks overload the model with irrelevant detail. Tiny fragments lose meaning. Chunks that cut sentences or mix unrelated sections produce weak embeddings. Missing overlap creates discontinuity. Lack of metadata confuses the retriever. Using a single universal chunking method for all document types also leads to poor results.
Chunking should never be one size fits all. Policies behave differently from textbooks, call transcripts behave differently from research papers. Your strategy must evolve with the document type and the retrieval task.

Final Thoughts
Chunking is not just a preprocessing step, it is the backbone of our RAG pipeline. A good chunk is a meaningful, self-contained unit of knowledge. A bad one is an orphaned fragment that leads the LLM astray.
If retrieval is the engine, chunking is the fuel. High-quality chunking produces clean, contextual, reliable RAG systems. Poor chunking creates noise and hallucination, no matter how good the LLM is.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!
Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Discover Your Dream AI Career at Towards AI Jobs
Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!
Note: Content contains the views of the contributing authors and not Towards AI.