Code is not prose. Function, class, and file boundaries matter more than line counts. Here's how to chunk code for retrieval without destroying its structure.
Fixed-size chunking is the simplest strategy. It's the default for a reason and a trap for a reason. Here's when it works and how to tune it.
Recursive chunking tries natural separators in order of preference. It's a pragmatic middle ground between fixed-size and semantic, and it's often the best default.
Semantic chunking splits on meaning, not size. It uses embeddings to find natural topic boundaries. Here's how and when it's worth the extra cost.
Structure-aware chunking uses document hierarchy (headings, sections, lists) as the primary boundary signal. For any corpus with real structure, this beats everything.
Chunking is the most under-thought part of most RAG systems. Here's why it matters, why the defaults are usually wrong, and what to optimize for.