Corrective RAG (CRAG)

Corrective RAG (CRAG) adds a quality-check step between retrieval and generation. If retrieved documents are judged insufficient or irrelevant, the system triggers a fallback strategy, typically a web search, query rewriting, or different retrieval approach. It's a practical pattern for handling retrieval failures gracefully.

The core flow

1. Retrieve from internal corpus
2. Evaluate retrieved documents:
   - Correct (high confidence, relevant)
   - Ambiguous (some relevant, some not)
   - Incorrect (nothing relevant found)
3. Based on evaluation:
   - Correct: generate answer from retrieved docs
   - Ambiguous: refine retrieval, add web search, or both
   - Incorrect: fall back to web search or "I don't know"
4. Generate final answer

The evaluator

The key component is the retrieval evaluator, a model (small classifier, LLM with a prompt, or score-based heuristic) that judges whether retrieval gave enough to answer.

LLM-based evaluator

SYSTEM: Given a user query and retrieved documents, judge whether the
documents contain enough information to answer the query. Respond with:
- SUFFICIENT: can answer directly
- PARTIAL: some info present, need more
- INSUFFICIENT: no relevant info

USER:
Query: [query]
Documents: [doc1, doc2, doc3]

Score-based

If all retrieved documents have cosine similarity below 0.7 (example threshold), flag as low-confidence. Cheap, no extra LLM call, but less reliable than LLM evaluation.

Hybrid

Score-based first pass (fast). For borderline cases, LLM evaluator.

Fallback strategies

Web search

When internal corpus lacks info, search the web (via Serper, Tavily, Brave Search, etc.). Append web results to context. Useful for current events, general knowledge, questions outside your corpus.

Query rewriting and retry

Rewrite the query with different terminology, retry retrieval. Simple, no external dependencies.

Knowledge base expansion

Escalate to broader knowledge sources: general docs, encyclopedic sources, parent organization's docs.

"I don't know" response

If nothing is found, acknowledge the gap instead of hallucinating. This is the underrated answer most systems skip.

Why this matters

The alternative to CRAG is: the model receives low-quality context and hallucinates a plausible but wrong answer. CRAG treats "retrieval was bad" as a first-class state, not a silent failure.

Latency

CRAG adds one evaluation step per query. If the evaluator is a small fast model, this adds 100-300ms. For queries that trigger fallback, total latency grows (web search adds seconds).

Common design choices

Granularity of evaluation

Evaluate per-document (which docs are relevant?) or per-set (is the overall set sufficient?). Per-set is simpler, per-document is more precise.

Fallback threshold

How strict is the evaluator? Strict = more fallbacks (better quality, higher cost). Lenient = fewer fallbacks (faster, cheaper, more hallucinations).

Parallel vs sequential fallback

Sequential: try retrieval, if bad, fall back. Parallel: always run retrieval + web search, use the better result. Parallel is faster for queries that need fallback but wastes compute on queries that don't.

Implementation sketch

def corrective_rag(query):
    retrieved = retrieve(query, top_k=10)
    judgment = evaluate(query, retrieved)

    if judgment == "SUFFICIENT":
        return generate(query, retrieved)

    if judgment == "PARTIAL":
        web_results = web_search(query)
        return generate(query, retrieved + web_results)

    if judgment == "INSUFFICIENT":
        rewritten = rewrite_query(query)
        retry_retrieved = retrieve(rewritten, top_k=20)
        retry_judgment = evaluate(query, retry_retrieved)

        if retry_judgment != "INSUFFICIENT":
            return generate(query, retry_retrieved)

        web_results = web_search(query)
        return generate(query, web_results)

When CRAG helps

When it's overkill

The simplest CRAG

You don't need a full CRAG implementation to benefit from the idea. Start with:

This simple version already stops most hallucination failures in production RAG.

What to do with this