Hybrid retrieval runs dense (vector) and sparse (BM25) searches and combines their results. It's the production default for any serious RAG system. This page covers the specific mechanics of how fusion works and where the tuning knobs are.
query
/ \
/ \
embed tokenize
| |
vector search BM25 search
| |
top-50 top-50
\ /
\ /
fusion (RRF)
|
top-50 merged
|
reranker
|
top-10
|
generation
They run in parallel, each returning their own ranked list.
The standard combination method. Each document's final score is the sum of reciprocal ranks across retrievers:
final_rrf(d) = sum(1 / (k + rank_i(d))) for each retriever i
Where k is a small constant (default 60). Documents in both lists get scored from both; documents in only one get scored from that one only.
Why RRF works well:
final = α × norm(dense_score) + (1-α) × norm(sparse_score)
Requires choosing α. Can tune on eval set. Typical α: 0.5 to 0.7 (slight dense bias).
Variants on score combination. Rarely materially better than RRF in practice.
Train a learning-to-rank model on labeled query-document pairs. Most complex, potentially highest quality. Worth it at serious scale with labeled data.
Native hybrid search with an alpha parameter (0 = pure sparse, 1 = pure dense). Simplest ergonomics.
Sparse vector support plus dense in one query. RRF or custom fusion.
Sparse-dense hybrid with rerank-on-top pattern.
Excellent BM25 plus vector support. Rank fusion via RRF available.
Strong hybrid with custom ranking expressions.
For RRF, retrieve deeper than your final desired count:
The deeper retrieval gives RRF more signal. Diminishing returns beyond top-100 per retriever for most applications.
Purely semantic queries ("what's the meaning of X") don't get much help from BM25 if the query terms are different from any document terms.
BM25 doesn't help when documents are primarily tables, numbers, or images.
OCR'd text with errors can match BM25 on typos; dense embeddings are more robust to this.
Hybrid is a reliable default, not a universal solution.
When a query is failing:
If dense has it and hybrid doesn't, fusion is buffering the right answer with weaker sparse results. Tune fusion weight.
If sparse has it and hybrid doesn't, the same in reverse.
If neither has it, it's a chunking or embedding problem upstream.
Measure each retriever separately and the hybrid result. Hit rate@10 and MRR for each. Over time you'll build intuition for which query types need which retriever.
Some queries do best on dense only. Some do best on sparse only. Hybrid averages them, which is usually, but not always, a win. Diagnostic data helps you decide when to override hybrid with a routing strategy.
Next: Reranking.