Chunking

Cutting long documents into smaller pieces so the AI can work with them.

Explained simply.

A 300-page manual won't fit in a prompt, and even if it did, asking the model to reason over it all at once gives bad answers. Chunking is the practice of splitting documents into smaller, self-contained pieces (usually 200-800 words each) before you store them. Each chunk gets its own embedding. When a question comes in, you retrieve the best matching CHUNKS, not whole documents.

An example.

You have a 50-page employee handbook. You chunk it into 150 pieces, one per section. When someone asks 'how many sick days do I get?', RAG retrieves the 3 chunks that mention sick leave, pastes them into the prompt, and the AI answers using just those 3 paragraphs - not all 50 pages.

Why it matters.

Bad chunking is the single biggest reason RAG systems give bad answers. Chunks too small = missing context. Too big = irrelevant noise overwhelms the real answer. Getting this right matters more than which model or database you pick.

Chunking

Explained simply.

An example.

Why it matters.

Related terms

Further reading