Picking an embedding model

Embedding model choice has a larger impact on retrieval quality than almost any other decision in the RAG stack. The right model for a general-knowledge corpus may be wrong for legal documents, code, or Japanese-language content. Here's how I pick.

The MTEB leaderboard is the starting point

MTEB (Massive Text Embedding Benchmark) ranks embedding models on standardized tasks. Look up current leaderboards on Hugging Face. Not gospel, but a reasonable starting short-list.

Caveats:

The five decision dimensions

1. Quality

Proxy: MTEB score, or better, your own eval set. Test 2-3 candidate models against a held-out set of real queries.

2. Cost

Embedding cost compounds: every reindex pays the cost again. For large corpora, this matters.

3. Latency

Embedding time per query affects user-facing latency. API calls: 50-200ms. Self-hosted small models: 5-20ms. Self-hosted large models: 30-100ms. For real-time systems, this is load-bearing.

4. Context length

Match to your chunk size. If your chunks are 1000 tokens, a model with 512-token context silently truncates your chunks. Check the spec.

5. Domain fit

General-purpose models work for general content. For code, law, medicine, or finance, domain-specific or domain-aware models (or fine-tuned ones) beat general models significantly.

My current default picks

For general prose, production

For general prose, self-hosted

For code

For legal/financial/medical

Evaluate domain-specific options first:

If none fit, fine-tune a general model on domain data (see fine-tuning embeddings).

For multilingual

Closed vs open

See the dedicated page: closed vs open embedding models.

The testing protocol

  1. Build a small eval set: 30-100 queries with known-good answer chunks
  2. Index your corpus with candidate models
  3. Measure hit-rate@k, MRR, NDCG for each
  4. Compare alongside cost and latency
  5. Pick the one on the best quality/cost frontier for your use case

This is a one-afternoon exercise that saves months of subtle retrieval issues.

The lock-in question

Switching embedding models means reindexing your entire corpus. For large corpora, this is expensive and slow. Factors to consider:

Build the infrastructure to switch embedding models without rebuilding everything else. Treat embedding_model_version as first-class metadata.

Next: Closed vs open embedding models.