Home›Expertise›RAGS to Riches›Hybrid retrieval

Hybrid retrieval

📖 5 min readUpdated 2026-04-18

Hybrid retrieval runs dense (vector) and sparse (BM25) searches and combines their results. It's the production default for any serious RAG system. This page covers the specific mechanics of how fusion works and where the tuning knobs are.

The architecture

              query
              /    \
             /      \
        embed       tokenize
          |            |
     vector search  BM25 search
          |            |
       top-50       top-50
          \           /
           \         /
          fusion (RRF)
               |
          top-50 merged
               |
           reranker
               |
           top-10
               |
         generation

The two retrievers

Dense retriever: vector similarity search over embeddings. See vector search.
Sparse retriever: BM25 or similar keyword-based scoring. See BM25 and sparse retrieval.

They run in parallel, each returning their own ranked list.

Fusion with Reciprocal Rank Fusion (RRF)

The standard combination method. Each document's final score is the sum of reciprocal ranks across retrievers:

final_rrf(d) = sum(1 / (k + rank_i(d))) for each retriever i

Where k is a small constant (default 60). Documents in both lists get scored from both; documents in only one get scored from that one only.

Why RRF works well:

Score-scale-independent: dense scores (0-1) and BM25 scores (0-30+) don't need normalization
Simple, deterministic, no tuning
Robust across different retriever combinations

Alternative fusion methods

Normalized weighted sum

final = α × norm(dense_score) + (1-α) × norm(sparse_score)

Requires choosing α. Can tune on eval set. Typical α: 0.5 to 0.7 (slight dense bias).

CombSUM, CombMAX, CombMNZ

Variants on score combination. Rarely materially better than RRF in practice.

Learned fusion (LTR)

Train a learning-to-rank model on labeled query-document pairs. Most complex, potentially highest quality. Worth it at serious scale with labeled data.

Native hybrid in vector DBs

Weaviate

Native hybrid search with an alpha parameter (0 = pure sparse, 1 = pure dense). Simplest ergonomics.

Qdrant

Sparse vector support plus dense in one query. RRF or custom fusion.

Pinecone

Sparse-dense hybrid with rerank-on-top pattern.

Elasticsearch / OpenSearch

Excellent BM25 plus vector support. Rank fusion via RRF available.

Vespa

Strong hybrid with custom ranking expressions.

The right top-k per retriever

For RRF, retrieve deeper than your final desired count:

Want 10 final results? Retrieve top-50 from each retriever, fuse, take top-50 merged, rerank.
Want 50 final results? Retrieve top-100+ from each retriever.

The deeper retrieval gives RRF more signal. Diminishing returns beyond top-100 per retriever for most applications.

Where hybrid falls short

Queries with no good sparse signal

Purely semantic queries ("what's the meaning of X") don't get much help from BM25 if the query terms are different from any document terms.

Corpora with non-text content

BM25 doesn't help when documents are primarily tables, numbers, or images.

Extremely noisy text

OCR'd text with errors can match BM25 on typos; dense embeddings are more robust to this.

Hybrid is a reliable default, not a universal solution.

The diagnostic workflow

When a query is failing:

Run dense-only: did it return the right answer?
Run sparse-only: did it return the right answer?
Run hybrid: did fusion hurt or help?

If dense has it and hybrid doesn't, fusion is buffering the right answer with weaker sparse results. Tune fusion weight.

If sparse has it and hybrid doesn't, the same in reverse.

If neither has it, it's a chunking or embedding problem upstream.

Metrics

Measure each retriever separately and the hybrid result. Hit rate@10 and MRR for each. Over time you'll build intuition for which query types need which retriever.

Some queries do best on dense only. Some do best on sparse only. Hybrid averages them, which is usually, but not always, a win. Diagnostic data helps you decide when to override hybrid with a routing strategy.

What to do with this

Default to RRF. Weighted sum is a tuning rabbit hole.
Retrieve deeper than your final k from each side to give fusion more signal.
When hybrid underperforms a single retriever, route by query type instead.