Multi-query + fusion

A single query embedding represents one angle on what the user wants. Multi-query retrieval generates several variations, retrieves for each, and fuses the results. It's a robust way to improve recall on tricky queries at the cost of more compute.

The pattern

  1. Take the user's original query
  2. Use an LLM to generate N variations (paraphrases, sub-questions, step-back questions, HyDE passages)
  3. Run retrieval for each variation independently
  4. Fuse the result lists (RRF or similar)
  5. Take the top-K merged results
  6. Optionally rerank

Kinds of variations

Paraphrases

Same meaning, different words.

Sub-questions

Decompose into parts.

Step-back

More general framing.

Step-forward

More specific framing.

HyDE-style answers

Hypothetical answer passages. See HyDE.

Fusion

Same as hybrid retrieval fusion, RRF is the default.

For each document d:
  rrf_score(d) = sum over all queries q: 1 / (k + rank_q(d))

Sort documents by rrf_score. Take top-K.

Documents that appear in multiple query variations' results get boosted. Documents that only appear in one get retained at lower ranks.

The RAG-Fusion technique

A specific multi-query pattern popularized around 2023:

  1. Generate 4-5 paraphrases of the original query
  2. Retrieve top-k for each paraphrase
  3. RRF fusion

Robust improvement over single-query retrieval on queries with vocabulary mismatch.

Parallel vs sequential retrieval

All query variations can run in parallel. With async retrieval, total latency is (LLM variation generation) + (longest single retrieval), not the sum.

With 4 variations and ~100ms each retrieval, parallel retrieval adds roughly 100ms total latency, not 400ms.

Cost tradeoffs

In return: typically 5-15% recall improvement, higher for short or ambiguous queries.

When multi-query is overkill

The pragmatic recipe

For a production RAG system that wants best-in-class retrieval:

  1. Generate 3 paraphrases of the original query (using a fast model)
  2. For each of the 4 queries (original + 3 paraphrases), run hybrid retrieval for top-50
  3. RRF-fuse across all 4 result lists
  4. Rerank top-50 merged → top-10
  5. Pass to generator

This adds about 150-300ms of latency and roughly 4x retrieval cost. For queries that benefited from it, quality is noticeably better. For queries that didn't, you've paid the cost without improvement.

Routing: when to use multi-query

Not every query benefits. A lightweight classifier or prompt can decide:

Skip multi-query when it doesn't help. Use it when it does. Measurement tells you which is which.

Next: Agentic RAG.