BM25 and sparse retrieval

BM25 is a classical keyword-based retrieval algorithm, older than most of your AI infrastructure. It's also essential in modern RAG. The reason: dense embeddings have blind spots that BM25 fills in cleanly. Any production RAG that isn't using BM25 somewhere is leaving 10-20% of retrieval quality on the floor.

How BM25 works

BM25 scores how well a query matches a document based on:

The score rewards documents that have the query's rare terms at reasonable frequency, normalized by length.

When BM25 wins over dense

When dense wins over BM25

BM25 in production

Elasticsearch / OpenSearch

Gold standard for BM25. Full-text search with tokenization, stemming, analyzers, boosting. If you're serious about search, this is a natural fit.

Postgres full-text

Reasonable BM25 support via tsvector and ts_rank_cd. Good when you already have Postgres and don't need Elasticsearch-level search features.

Native in vector DBs

Weaviate, Qdrant (via sparse vectors), Vespa, Pinecone (sparse vectors), Milvus all support BM25 or similar sparse search natively. Removes the need for a separate search system.

Library: rank-bm25 (Python)

In-memory BM25 for small corpora or prototyping. Not production-scale.

Tokenization matters

BM25 quality depends heavily on tokenization:

Your BM25 quality is capped by tokenization choices. Default English tokenizers work for most cases. Specialized domains (law, medicine, chemistry) often need custom tokenizers.

BM25F and field-weighted search

BM25F extends BM25 to weight different fields differently. A match in the title might be worth 3x a match in the body. For documents with structure (title, abstract, body), this is valuable.

Elasticsearch supports this via multi_match queries. Many vector DBs don't, another reason to consider ES/OS for structured text search.

Learned sparse: SPLADE

A newer approach: use a transformer to learn sparse vector representations. Each token contributes to a high-dimensional sparse vector, with the model learning which tokens matter and expanding queries with learned synonyms.

Benefits:

Drawbacks:

Models: SPLADE v3, naver/splade. Supported by recent Qdrant, Vespa, OpenSearch.

Hybrid is the answer

In almost every production RAG system, BM25 + dense hybrid outperforms either alone. The sparse and dense vectors capture complementary signal. The fusion (RRF or similar) combines them cleanly. See hybrid retrieval.

The old-school lesson

BM25 is fast, predictable, debuggable, and free of GPU dependencies. When your vector search is broken, your metrics are confusing, or your embeddings go stale, BM25 still works. Keep it in the stack as a fallback if nothing else.

Next: Hybrid retrieval.