Chunking.

The single biggest quality lever in RAG. How to split documents right.

Chunking code

Code is not prose. Function, class, and file boundaries matter more than line counts. Here's how to chunk code for retrieval without destroying its structure.

Fixed-size chunking

Fixed-size chunking is the simplest strategy. It's the default for a reason and a trap for a reason. Here's when it works and how to tune it.

Recursive chunking

Recursive chunking tries natural separators in order of preference. It's a pragmatic middle ground between fixed-size and semantic, and it's often the best default.

Semantic chunking

Semantic chunking splits on meaning, not size. It uses embeddings to find natural topic boundaries. Here's how and when it's worth the extra cost.

Structure-aware chunking

Structure-aware chunking uses document hierarchy (headings, sections, lists) as the primary boundary signal. For any corpus with real structure, this beats everything.

Why chunking matters

Chunking is the most under-thought part of most RAG systems. Here's why it matters, why the defaults are usually wrong, and what to optimize for.