RAG (Retrieval-Augmented Generation) is the discipline of grounding a language model's answers in your own data. Almost every production AI system I've worked on eventually converges to some version of RAG. It's the difference between a model that bluffs and a model that cites. Between a demo that looks good and a system that holds up under real traffic.
This section is 56 pages on how to take a RAG system from "throwing docs into a vector database and hoping" all the way to production-grade retrieval. Rags to riches, from the naive v1 most teams ship and regret, to the layered, evaluated, observable systems that actually work at scale.
What RAG is, why it beats fine-tuning for most use cases, the architecture map, and when to skip it entirely.
PDFs, HTML, tables, figures, OCR, metadata, the unglamorous 80% of every real RAG system.
Fixed-size, semantic, recursive, structure-aware. The single most under-thought part of most RAG stacks.
Picking models, closed vs open, dimensions, MRL, fine-tuning. Where your retrieval ceiling is actually set.
HNSW, IVF, PQ, hybrid indexes, metadata filtering, cost. The infrastructure layer.
Dense, sparse, hybrid, reranking, HyDE, query rewriting, multi-query fusion. The real craft.
Agentic RAG, GraphRAG, CRAG, Self-RAG, multi-hop. Where modern RAG is actually going.
Retrieval metrics, generation metrics, RAGAS, building eval sets. If you skip this section your system will silently rot.
Latency, caching, observability, cost, security. Turning a notebook into a service.
Customer support, internal KB, code search, legal, multi-tenant. The patterns I keep reaching for.
If you're new to RAG, start at Foundations and go section by section. If you've already shipped a v1 and it underwhelms in production, skip to Reranking and Evaluation. Those are the two places where most naive RAG systems leave the most value on the floor.
The thread running through all of it: RAG isn't one thing. It's a pipeline with a dozen independent decisions, and the quality of your system is the product of all of them, not the max.