Hybrid retrieval

Hybrid retrieval runs dense (vector) and sparse (BM25) searches and combines their results. It's the production default for any serious RAG system. This page covers the specific mechanics of how fusion works and where the tuning knobs are.

The architecture

              query
              /    \
             /      \
        embed       tokenize
          |            |
     vector search  BM25 search
          |            |
       top-50       top-50
          \           /
           \         /
          fusion (RRF)
               |
          top-50 merged
               |
           reranker
               |
           top-10
               |
         generation

The two retrievers

They run in parallel, each returning their own ranked list.

Fusion with Reciprocal Rank Fusion (RRF)

The standard combination method. Each document's final score is the sum of reciprocal ranks across retrievers:

final_rrf(d) = sum(1 / (k + rank_i(d))) for each retriever i

Where k is a small constant (default 60). Documents in both lists get scored from both; documents in only one get scored from that one only.

Why RRF works well:

Alternative fusion methods

Normalized weighted sum

final = α × norm(dense_score) + (1-α) × norm(sparse_score)

Requires choosing α. Can tune on eval set. Typical α: 0.5 to 0.7 (slight dense bias).

CombSUM, CombMAX, CombMNZ

Variants on score combination. Rarely materially better than RRF in practice.

Learned fusion (LTR)

Train a learning-to-rank model on labeled query-document pairs. Most complex, potentially highest quality. Worth it at serious scale with labeled data.

Native hybrid in vector DBs

Weaviate

Native hybrid search with an alpha parameter (0 = pure sparse, 1 = pure dense). Simplest ergonomics.

Qdrant

Sparse vector support plus dense in one query. RRF or custom fusion.

Pinecone

Sparse-dense hybrid with rerank-on-top pattern.

Elasticsearch / OpenSearch

Excellent BM25 plus vector support. Rank fusion via RRF available.

Vespa

Strong hybrid with custom ranking expressions.

The right top-k per retriever

For RRF, retrieve deeper than your final desired count:

The deeper retrieval gives RRF more signal. Diminishing returns beyond top-100 per retriever for most applications.

Where hybrid falls short

Queries with no good sparse signal

Purely semantic queries ("what's the meaning of X") don't get much help from BM25 if the query terms are different from any document terms.

Corpora with non-text content

BM25 doesn't help when documents are primarily tables, numbers, or images.

Extremely noisy text

OCR'd text with errors can match BM25 on typos; dense embeddings are more robust to this.

Hybrid is a reliable default, not a universal solution.

The diagnostic workflow

When a query is failing:

  1. Run dense-only: did it return the right answer?
  2. Run sparse-only: did it return the right answer?
  3. Run hybrid: did fusion hurt or help?

If dense has it and hybrid doesn't, fusion is buffering the right answer with weaker sparse results. Tune fusion weight.

If sparse has it and hybrid doesn't, the same in reverse.

If neither has it, it's a chunking or embedding problem upstream.

Metrics

Measure each retriever separately and the hybrid result. Hit rate@10 and MRR for each. Over time you'll build intuition for which query types need which retriever.

Some queries do best on dense only. Some do best on sparse only. Hybrid averages them, which is usually, but not always, a win. Diagnostic data helps you decide when to override hybrid with a routing strategy.

Next: Reranking.