Home›Expertise›RAGS to Riches›Multi-query + fusion

Multi-query + fusion

📖 4 min readUpdated 2026-04-18

A single query embedding represents one angle on what the user wants. Multi-query retrieval generates several variations, retrieves for each, and fuses the results. It's a robust way to improve recall on tricky queries at the cost of more compute.

The pattern

Take the user's original query
Use an LLM to generate N variations (paraphrases, sub-questions, step-back questions, HyDE passages)
Run retrieval for each variation independently
Fuse the result lists (RRF or similar)
Take the top-K merged results
Optionally rerank

Kinds of variations

Paraphrases

Same meaning, different words.

"How do I reset my password?"
"What's the process for changing login credentials?"
"Steps to recover account access when locked out"

Sub-questions

Decompose into parts.

"Why is our API slow?" → "What's the current API latency?" + "What are common causes of API latency?" + "How do we measure API performance?"

Step-back

More general framing.

"Why did OAuth token expire?" → "How does OAuth token expiration work?"

Step-forward

More specific framing.

"How do I integrate your API?" → "How do I integrate your API in Node.js?" + "How do I integrate your API in Python?"

HyDE-style answers

Hypothetical answer passages. See HyDE.

Fusion

Same as hybrid retrieval fusion, RRF is the default.

For each document d:
  rrf_score(d) = sum over all queries q: 1 / (k + rank_q(d))

Sort documents by rrf_score. Take top-K.

Documents that appear in multiple query variations' results get boosted. Documents that only appear in one get retained at lower ranks.

The RAG-Fusion technique

A specific multi-query pattern popularized around 2023:

Generate 4-5 paraphrases of the original query
Retrieve top-k for each paraphrase
RRF fusion

Robust improvement over single-query retrieval on queries with vocabulary mismatch.

Parallel vs sequential retrieval

All query variations can run in parallel. With async retrieval, total latency is (LLM variation generation) + (longest single retrieval), not the sum.

With 4 variations and ~100ms each retrieval, parallel retrieval adds roughly 100ms total latency, not 400ms.

Cost tradeoffs

N variations = N retrieval calls. At high QPS this adds up.
LLM call to generate variations: 100-400ms, modest cost with cheap models.
Reranking cost increases too (more candidates to rerank).

In return: typically 5-15% recall improvement, higher for short or ambiguous queries.

When multi-query is overkill

Queries that are already specific and verbose
When hybrid retrieval already covers the recall gap
Cost-sensitive applications where extra LLM calls aren't justified
Latency-sensitive applications where the extra 100ms matters

The pragmatic recipe

For a production RAG system that wants best-in-class retrieval:

Generate 3 paraphrases of the original query (using a fast model)
For each of the 4 queries (original + 3 paraphrases), run hybrid retrieval for top-50
RRF-fuse across all 4 result lists
Rerank top-50 merged → top-10
Pass to generator

This adds about 150-300ms of latency and roughly 4x retrieval cost. For queries that benefited from it, quality is noticeably better. For queries that didn't, you've paid the cost without improvement.

Routing: when to use multi-query

Not every query benefits. A lightweight classifier or prompt can decide:

Short queries (< 5 words): use multi-query
Long detailed queries: single-query is sufficient
Vague queries with ambiguous intent: use multi-query
Queries with specific terminology and clear intent: single-query

Skip multi-query when it doesn't help. Use it when it does. Measurement tells you which is which.

What to do with this

Parallelize query variations. Latency stays manageable if you do.
Route by query shape rather than applying multi-query to everything.
Rerank the fused output; the extra recall is wasted without it.