Adaptive RAG

Not every query needs the same retrieval strategy. Adaptive RAG classifies incoming queries and routes them to the most suitable approach, simple retrieval, hybrid, multi-query, agentic, or no retrieval at all. It's how production systems balance cost, latency, and quality.

The core idea

A classifier sits between the user and the retrieval pipeline. It looks at the query and decides:

Based on the classification, it picks a retrieval strategy.

Query types and their strategies

Trivial / greeting

"Hi", "thanks", "can you help me?", no retrieval, respond directly.

Common knowledge

"What's the capital of France?", no retrieval, LLM answers from pretraining.

Simple factual

"What's the refund window?", single-hop dense retrieval or hybrid.

Multi-entity

"Compare products A and B", multi-query with one retrieval per entity.

Multi-hop reasoning

"Who manages the team that shipped X?", agentic RAG with iterative retrieval.

Corpus-wide synthesis

"What are the main themes in our 2023 customer feedback?", GraphRAG or summarization over retrieved sets.

Structured query

"How many customers signed up last month?", text-to-SQL, not vector retrieval.

Out-of-scope

Query unrelated to your domain, decline or deflect.

Classifier options

LLM-based

Prompt a small fast model to classify the query. Simple, flexible, costs a call per query. Most common in production.

SYSTEM: Classify the following query into one of:
- SIMPLE: single-fact question answerable from one document
- MULTI_HOP: requires combining info from multiple sources
- SYNTHESIS: requires summarizing across many documents
- STRUCTURED: requires querying structured data
- NO_RETRIEVAL: can be answered without documents

USER: [query]
OUTPUT: [classification] + reasoning

Fine-tuned classifier

Train a small model on labeled examples. Faster and cheaper per query. Requires labeled data.

Heuristic

Rule-based: short queries are often simple; queries with "compare", "vs", "difference" are multi-entity; queries with "summary", "overview", "themes" are synthesis. Fast, limited.

Embedding-based

Embed the query and nearest-neighbor search against a labeled query corpus. Medium-fast, medium-quality.

The routing tree

                 query
                   |
              classify
                   |
     +----+---+---+----+---+---+----+
     |    |   |   |    |   |   |    |
  greet simple multi multi syn  SQL no-ret
        |      hop   ent
        |       |     |    |    |
     hybrid   agent  multi graph LLM-only
     +rerank  ic     query  RAG
              RAG

Cost savings

In a typical customer-facing RAG system, 20-40% of queries don't need retrieval at all. Routing them to no-retrieval saves:

For high-volume systems this is meaningful.

Implementation

Route before retrieval

Classifier runs first. Only calls retrieval if needed. Cleanest and cheapest.

Route during retrieval

Do quick retrieval; if confidence is low or results are thin, escalate to more complex strategies. Adapts dynamically but pays the cheap-retrieval cost upfront.

Corrective routing

Always run simple retrieval. If generator indicates insufficient context, trigger multi-hop or agentic retrieval. See Corrective RAG.

Measurement

The evolution path

Teams typically start with one strategy (simple retrieval), then add adaptive routing as they see failure modes:

  1. v1: vanilla hybrid retrieval
  2. v2: add no-retrieval path for trivial queries (easy win)
  3. v3: add multi-query for ambiguous queries
  4. v4: add agentic path for multi-hop
  5. v5: add structured-query path for data-heavy queries

Each step: build the route, measure the quality and cost impact, keep what helps.

Next: Corrective RAG (CRAG).