Home›Expertise›RAGS to Riches›Why chunking matters

Why chunking matters

📖 5 min readUpdated 2026-04-18

Chunking is where the most avoidable quality loss in RAG happens. Most teams use LangChain's default 1000-character splitter and never revisit the decision. That default is wrong for almost every real corpus. Chunking affects retrieval quality more than most teams realize, and getting it right is one of the highest-leverage changes you can make.

Why chunking exists at all

Three reasons you can't just embed whole documents:

Embedding context windows. Most embedding models cap at 512-8192 tokens. Documents often exceed this.
Retrieval granularity. If the answer to a question lives in one paragraph, retrieving the whole 50-page document buries it.
Prompt context limits. Even with long-context LLMs, passing 10 whole documents of retrieved context is expensive and dilutes attention.

What you're optimizing for

A good chunk has three properties:

Self-contained meaning. A reader can understand it without surrounding context.
Single topic. The chunk is about one thing, not a mix.
Retrievable. It matches the kind of query a user would make.

Fixed-size chunking often fails on all three. A 1000-character window can split mid-sentence, blend two topics, and lose enough context that the chunk is meaningless standalone.

The chunking trade-off curve

Too small

Fragments without context
High recall, low precision (too many false-positive matches)
LLM can't reason about the isolated snippet
More chunks = more embeddings = more cost

Too large

Single chunk covers multiple topics, embedding becomes a muddled average
Lower recall (the specific answer is buried in surrounding text)
LLM context budget wasted on irrelevant content
Harder to cite precisely

Goldilocks

Usually 200-800 tokens for prose, with some overlap, and respecting natural boundaries (paragraphs, sections). Exact sweet spot depends heavily on content type.

The overlap question

Overlap (including some tokens from the end of the previous chunk in the start of the next) prevents information loss at boundaries. Common settings: 10-20% overlap.

Overlap helps when:

Important information spans arbitrary boundaries
Context before/after a fact matters for answering
You're using fixed-size chunking without structure awareness

Overlap hurts when:

You have good natural boundaries (paragraphs, headings) and should use those instead
You care about precision (overlapping chunks can match the same query and both show up in top-k, reducing diversity)

The chunking-for-query principle

Chunks should look like the retrievable unit of a question, not the unit of a document. If users ask "what's our refund policy?" the right chunk is the refund policy paragraph, not arbitrarily chunked 1000-character slices of a policy document.

This means chunking strategy depends on the queries you expect, not just the documents you have. For many corpora, semantic or structure-aware chunking beats fixed-size because it produces chunks that look more like complete thoughts.

Why the defaults are wrong

LangChain's default RecursiveCharacterTextSplitter with 1000 chars and 200 overlap is:

Too large for short-query retrieval
Too small for reasoning-intensive content
Blind to structure (doesn't care about headings, sections, or paragraphs)
Character-based (ignores that tokens vary in size across languages)

It's a reasonable zero-config starting point. It's a terrible final answer.

Chunking strategies by content type

Documentation / knowledge bases: structure-aware, by heading sections
Long prose / books: semantic or recursive, 400-800 tokens
FAQs / Q&A: one question+answer per chunk
Code: function/class/file boundaries (see chunking code)
Chat logs / conversational: by conversation turn or topic shift
Legal / contracts: by clause with surrounding context
Scientific papers: by section, with special handling for abstracts and references

The experiment mindset

Chunking is one of the easiest things to A/B test in RAG. Index the same corpus with different chunking strategies, run the same eval set against both, compare retrieval metrics. Most teams never do this. The teams that do typically find their chunking baseline was leaving 20-40% of retrieval quality on the table.

What to do with this

Don't ship the default 1000-char splitter. Pick a strategy that matches your content type.
Size chunks to the shape of user questions, not the shape of documents.
A/B test two chunking strategies on your eval set before committing.