Home›Expertise›RAGS to Riches›Hybrid search

Hybrid search

📖 5 min readUpdated 2026-04-18

Dense embeddings are great at synonyms and paraphrases but weak at exact matches, rare entities, and specific codes. Keyword search (BM25) is the opposite. Hybrid search runs both and combines the results. For production RAG, hybrid retrieval consistently outperforms dense-only. Most teams ship dense-only first and then regret it.

Why hybrid matters

Consider a user query: "error E-47 in the invoice processor"

Dense embedding sees "error ... invoice processor", retrieves semantically similar documents about invoice errors in general
BM25 sees "E-47" as a specific term, retrieves documents actually mentioning E-47
Hybrid gets both: documents about invoice errors, with a boost for those explicitly mentioning E-47

The hybrid result is almost always more useful.

The two components

Dense (vector) retrieval

Embed query, find nearest neighbors. Covered throughout this section.

Sparse (BM25) retrieval

Classical information retrieval. Score documents by term frequency, inverse document frequency, and document length. Fast, exact, no training required. See BM25 and sparse retrieval.

Combining results: fusion strategies

Reciprocal Rank Fusion (RRF)

The default and usually best-performing fusion method. For each document, compute its score as:

RRF_score(d) = Σ  1 / (k + rank_i(d))
              i

where rank_i(d) is the document's rank in the i-th retrieval method's result list, and k is a small constant (typically 60). Sum over all retrievers.

RRF normalizes differently-scored retrievers (BM25 scores can be 0-30, vector cosine 0-1) without any tuning. Just combines ranks.

Score normalization + weighted sum

Min-max or z-score normalize each retriever's scores, then weighted sum:

final_score = α * dense_score + (1 - α) * sparse_score

Requires tuning α. Typical sweet spot: α = 0.5 to 0.7 (slightly favor dense).

Learned fusion

Train a small model to combine scores from multiple retrievers. Best quality, most complexity.

Late interaction (ColBERT)

A different approach: embed every token in queries and documents, match at the token level. Stronger for precise retrieval but more expensive to store and serve.

Implementation patterns

Single database that supports both

Weaviate, Qdrant, OpenSearch, Elasticsearch, Vespa, and Pinecone (with sparse vectors) all support hybrid natively. Cleanest architecture.

Two databases, fused at query time

Run BM25 in Elasticsearch/OpenSearch, dense in a vector DB. At query time, call both, fuse results with RRF. More infrastructure but maximum flexibility.

Single database, fused manually

Store dense vectors in a vector DB, use its full-text capabilities for BM25 (pgvector + Postgres full-text, for example). Works, less optimized than dedicated hybrid solutions.

Common pitfalls

Mismatched top-k

If BM25 returns top-20 and dense returns top-100, the union is skewed toward dense results. Match top-k across retrievers, or normalize the retrieval depth.

Text preprocessing differences

BM25 tokenization (stopwords, stemming, lowercase) must match what you expect. A query "reset my password" with stopword removal becomes "reset password" and matches differently than with stopwords preserved.

No dedup between retrievers

A chunk appearing in both retriever results should be combined, not duplicated. RRF handles this. Weighted sum doesn't without explicit dedup logic.

Hybrid without reranking

Even hybrid retrieval benefits from a cross-encoder rerank on top. See reranking.

The quality gain

On standard benchmarks, hybrid RRF typically outperforms dense-only by 5-20% on retrieval metrics. On domain-specific corpora with lots of proper nouns, technical terms, or rare identifiers, the gain can be 30%+.

Cost: roughly 2x retrieval latency (you run two searches), negligible extra storage (sparse indexes are small).

My default recommendation

Any production RAG system should use hybrid retrieval. Dense-only is a prototype-stage choice. The engineering cost is modest, the quality gain is real, and the failure modes are asymmetric, hybrid handles the edge cases (rare terms, IDs, codes) that pure dense can't.

What to do with this

Use hybrid with RRF as your default. Dense-only is a prototype choice.
Match top-k across retrievers before fusion; unbalanced top-k skews results.
Always add a reranker on top of the fused results.