Multi-hop RAG

Multi-hop questions require answers that combine information from multiple independent documents. "Who manages the team that ships the feature the CEO mentioned last quarter?" needs three retrievals, not one. Vanilla RAG handles single-hop. Multi-hop needs orchestration.

What multi-hop looks like

Single-hop: "What's our refund policy?" → retrieve the refund policy doc → answer.

Multi-hop: "What's the refund policy for the product Alice launched last quarter?"

Needs:

  1. What product did Alice launch last quarter?
  2. What's the refund policy for that product?

One-shot retrieval on the original query retrieves either Alice-related docs or refund-related docs, not the specific intersection.

The two approaches

Decomposition

Break the query into sub-questions, retrieve for each, combine.

  1. LLM decomposes the query
  2. Retrieve for each sub-question
  3. Gather all retrieved docs
  4. Generate final answer from combined context

Iterative (agentic)

Start with partial info, retrieve more based on what you found, repeat.

  1. Initial retrieval on the query
  2. Model decides: do I have enough? If not, what else do I need?
  3. Retrieve again for the follow-up need
  4. Repeat until sufficient
  5. Generate

Decomposition prompt

SYSTEM: Break the following question into simpler sub-questions that
can each be answered with a single document lookup. List each
sub-question on a new line.

USER: [multi-hop question]

A:
1. [sub-question 1]
2. [sub-question 2]
3. [sub-question 3]

Iterative prompt pattern

[Retrieval round 1 results shown]

SYSTEM: Based on the retrieved documents, do you have enough
information to answer the user's question? If yes, answer. If no,
what additional information do you need? Output either a final
answer or a follow-up search query.

USER: [original question]

Retrieved: [docs from round 1]

When decomposition beats iterative

When iterative beats decomposition

Chain-of-thought retrieval

A middle-ground pattern: ask the LLM to plan retrievals before executing them.

  1. LLM generates a retrieval plan: "First I'll look up X, then based on that I'll look up Y, finally I'll combine..."
  2. Execute the plan step by step
  3. At each step, the LLM can adjust based on actual retrieval results

Plans are easier to debug than free-form iterative loops and cheaper than full agent-based orchestration.

The common failure modes

Bad decomposition

LLM produces sub-questions that don't actually decompose the problem. Retrievals don't answer what's needed.

Mitigation: few-shot examples of good decompositions in the prompt. Validate decompositions for obvious problems (e.g., more than 5 sub-questions is suspicious).

Lost context between hops

Round 2 retrieval doesn't use what round 1 learned. The follow-up query is too general.

Mitigation: explicitly include round 1 findings in the round 2 query construction. "Given that Alice launched Product X, retrieve the refund policy for Product X."

Runaway iteration

Agent keeps retrieving without converging. Often because the information truly isn't in the corpus.

Mitigation: max iterations (3-5), timeout, cost budget per query.

Benchmarks for multi-hop

Academic benchmarks specifically for multi-hop RAG:

These are useful for comparing strategies but don't necessarily reflect your production query distribution. Build a domain-specific eval set if multi-hop is important.

Cost reality

Multi-hop RAG is expensive:

Total: 3-10x single-shot RAG cost. Only use it when the question actually needs it.

The adaptive pattern (classify first, route to multi-hop only when needed) keeps cost reasonable across a mixed query distribution.

Next: Why evaluation is critical.