Home›Expertise›RAGS to Riches›Multi-hop RAG

Multi-hop RAG

📖 4 min readUpdated 2026-04-18

Multi-hop questions require answers that combine information from multiple independent documents. "Who manages the team that ships the feature the CEO mentioned last quarter?" needs three retrievals, not one. Vanilla RAG handles single-hop. Multi-hop needs orchestration.

What multi-hop looks like

Single-hop: "What's our refund policy?" → retrieve the refund policy doc → answer.

Multi-hop: "What's the refund policy for the product Alice launched last quarter?"

Needs:

What product did Alice launch last quarter?
What's the refund policy for that product?

One-shot retrieval on the original query retrieves either Alice-related docs or refund-related docs, not the specific intersection.

The two approaches

Decomposition

Break the query into sub-questions, retrieve for each, combine.

LLM decomposes the query
Retrieve for each sub-question
Gather all retrieved docs
Generate final answer from combined context

Iterative (agentic)

Start with partial info, retrieve more based on what you found, repeat.

Initial retrieval on the query
Model decides: do I have enough? If not, what else do I need?
Retrieve again for the follow-up need
Repeat until sufficient
Generate

Decomposition prompt

SYSTEM: Break the following question into simpler sub-questions that
can each be answered with a single document lookup. List each
sub-question on a new line.

USER: [multi-hop question]

A:
1. [sub-question 1]
2. [sub-question 2]
3. [sub-question 3]

Iterative prompt pattern

[Retrieval round 1 results shown]

SYSTEM: Based on the retrieved documents, do you have enough
information to answer the user's question? If yes, answer. If no,
what additional information do you need? Output either a final
answer or a follow-up search query.

USER: [original question]

Retrieved: [docs from round 1]

When decomposition beats iterative

The sub-questions are independent of each other
You want parallelism (all retrievals can run concurrently)
The decomposition is predictable for the query type

When iterative beats decomposition

Later retrievals depend on earlier ones (sequential)
You don't know what you'll need until you see partial results
The query is exploratory

Chain-of-thought retrieval

A middle-ground pattern: ask the LLM to plan retrievals before executing them.

LLM generates a retrieval plan: "First I'll look up X, then based on that I'll look up Y, finally I'll combine..."
Execute the plan step by step
At each step, the LLM can adjust based on actual retrieval results

Plans are easier to debug than free-form iterative loops and cheaper than full agent-based orchestration.

The common failure modes

Bad decomposition

LLM produces sub-questions that don't actually decompose the problem. Retrievals don't answer what's needed.

Mitigation: few-shot examples of good decompositions in the prompt. Validate decompositions for obvious problems (e.g., more than 5 sub-questions is suspicious).

Lost context between hops

Round 2 retrieval doesn't use what round 1 learned. The follow-up query is too general.

Mitigation: explicitly include round 1 findings in the round 2 query construction. "Given that Alice launched Product X, retrieve the refund policy for Product X."

Runaway iteration

Agent keeps retrieving without converging. Often because the information truly isn't in the corpus.

Mitigation: max iterations (3-5), timeout, cost budget per query.

Benchmarks for multi-hop

Academic benchmarks specifically for multi-hop RAG:

HotpotQA: classic multi-hop QA
2WikiMultiHopQA: Wikipedia-based multi-hop
MuSiQue: deliberately complex multi-hop questions

These are useful for comparing strategies but don't necessarily reflect your production query distribution. Build a domain-specific eval set if multi-hop is important.

Cost reality

Multi-hop RAG is expensive:

Decomposition: 1 LLM call
Retrievals: N calls to vector DB (N = number of sub-questions)
Generation: 1-2 LLM calls for final synthesis

Total: 3-10x single-shot RAG cost. Only use it when the question actually needs it.

The adaptive pattern (classify first, route to multi-hop only when needed) keeps cost reasonable across a mixed query distribution.

What to do with this

Don't multi-hop everything. Route by query shape.
Prefer decomposition when sub-questions are independent (parallel).
Cap iterations hard; runaway agent loops are the dominant failure mode.