Not every query needs the same retrieval strategy. Adaptive RAG classifies incoming queries and routes them to the most suitable approach, simple retrieval, hybrid, multi-query, agentic, or no retrieval at all. It's how production systems balance cost, latency, and quality.
A classifier sits between the user and the retrieval pipeline. It looks at the query and decides:
Based on the classification, it picks a retrieval strategy.
"Hi", "thanks", "can you help me?", no retrieval, respond directly.
"What's the capital of France?", no retrieval, LLM answers from pretraining.
"What's the refund window?", single-hop dense retrieval or hybrid.
"Compare products A and B", multi-query with one retrieval per entity.
"Who manages the team that shipped X?", agentic RAG with iterative retrieval.
"What are the main themes in our 2023 customer feedback?", GraphRAG or summarization over retrieved sets.
"How many customers signed up last month?", text-to-SQL, not vector retrieval.
Query unrelated to your domain, decline or deflect.
Prompt a small fast model to classify the query. Simple, flexible, costs a call per query. Most common in production.
SYSTEM: Classify the following query into one of: - SIMPLE: single-fact question answerable from one document - MULTI_HOP: requires combining info from multiple sources - SYNTHESIS: requires summarizing across many documents - STRUCTURED: requires querying structured data - NO_RETRIEVAL: can be answered without documents USER: [query] OUTPUT: [classification] + reasoning
Train a small model on labeled examples. Faster and cheaper per query. Requires labeled data.
Rule-based: short queries are often simple; queries with "compare", "vs", "difference" are multi-entity; queries with "summary", "overview", "themes" are synthesis. Fast, limited.
Embed the query and nearest-neighbor search against a labeled query corpus. Medium-fast, medium-quality.
query
|
classify
|
+----+---+---+----+---+---+----+
| | | | | | | |
greet simple multi multi syn SQL no-ret
| hop ent
| | | | |
hybrid agent multi graph LLM-only
+rerank ic query RAG
RAG
In a typical customer-facing RAG system, 20-40% of queries don't need retrieval at all. Routing them to no-retrieval saves:
For high-volume systems this is meaningful.
Classifier runs first. Only calls retrieval if needed. Cleanest and cheapest.
Do quick retrieval; if confidence is low or results are thin, escalate to more complex strategies. Adapts dynamically but pays the cheap-retrieval cost upfront.
Always run simple retrieval. If generator indicates insufficient context, trigger multi-hop or agentic retrieval. See Corrective RAG.
Teams typically start with one strategy (simple retrieval), then add adaptive routing as they see failure modes:
Each step: build the route, measure the quality and cost impact, keep what helps.
Next: Corrective RAG (CRAG).