Retrieval-Augmented Generation is how you turn an LLM into 'Claude, but with knowledge of YOUR data.' Instead of retraining the model (slow, expensive, stale), you give it a library to look things up in, and the model cites from what it finds. Every 'chat with your docs' product on earth is RAG under the hood. This 70-page section covers the full pipeline: foundations, embeddings, vector stores, chunking strategies, retrieval approaches, document handling, evaluation, production patterns, and real case studies. RAG looks simple in demos. Getting it production-grade is harder than you'd think, and this section is the map.
What RAG is, why it beats fine-tuning for most use cases, the architecture map, and when to skip it entirely.
PDFs, HTML, tables, figures, OCR, metadata, the unglamorous 80% of every real RAG system.
Fixed-size, semantic, recursive, structure-aware. The single most under-thought part of most RAG stacks.
Picking models, closed vs open, dimensions, MRL, fine-tuning. Where your retrieval ceiling is actually set.
HNSW, IVF, PQ, hybrid indexes, metadata filtering, cost. The infrastructure layer.
Dense, sparse, hybrid, reranking, HyDE, query rewriting, multi-query fusion. The real craft.
Agentic RAG, GraphRAG, CRAG, Self-RAG, multi-hop. Where modern RAG is actually going.
Retrieval metrics, generation metrics, RAGAS, building eval sets. If you skip this section your system will silently rot.
Latency, caching, observability, cost, security. Turning a notebook into a service.
Customer support, internal KB, code search, legal, multi-tenant. The patterns I keep reaching for.
If you're new to RAG, start at Foundations and go section by section. If you've already shipped a v1 and it underwhelms in production, skip to Reranking and Evaluation. Those are the two places where most naive RAG systems leave the most value on the floor.
The thread running through all of it: RAG isn't one thing. It's a pipeline with a dozen independent decisions, and the quality of your system is the product of all of them, not the max.
3Blue1Brown - Attention in transformers, visually explained