Picking an embedding model
📖 6 min readUpdated 2026-04-18
Embedding model choice has a larger impact on retrieval quality than almost any other decision in the RAG stack. The right model for a general-knowledge corpus may be wrong for legal documents, code, or Japanese-language content. Here's how I pick.
The MTEB leaderboard is the starting point
MTEB (Massive Text Embedding Benchmark) ranks embedding models on standardized tasks. Look up current leaderboards on Hugging Face. Not gospel, but a reasonable starting short-list.
Caveats:
- MTEB measures average performance across tasks. Your domain may not match any task.
- Top models on MTEB are often very large and expensive to run.
- Leaderboards don't capture latency, cost, or context length well.
The five decision dimensions
1. Quality
Proxy: MTEB score, or better, your own eval set. Test 2-3 candidate models against a held-out set of real queries.
2. Cost
- OpenAI text-embedding-3-small: $0.02 / 1M tokens
- OpenAI text-embedding-3-large: $0.13 / 1M tokens
- Cohere embed-v3: ~$0.10 / 1M tokens
- Voyage-3-large: ~$0.18 / 1M tokens
- Self-hosted open-source: GPU-hours only (often significantly cheaper at scale)
Embedding cost compounds: every reindex pays the cost again. For large corpora, this matters.
3. Latency
Embedding time per query affects user-facing latency. API calls: 50-200ms. Self-hosted small models: 5-20ms. Self-hosted large models: 30-100ms. For real-time systems, this is load-bearing.
4. Context length
Match to your chunk size. If your chunks are 1000 tokens, a model with 512-token context silently truncates your chunks. Check the spec.
5. Domain fit
General-purpose models work for general content. For code, law, medicine, or finance, domain-specific or domain-aware models (or fine-tuned ones) beat general models significantly.
My current default picks
For general prose, production
- OpenAI text-embedding-3-small: cheap, fast, solid. Good baseline.
- OpenAI text-embedding-3-large: when quality matters and cost is secondary.
- Cohere embed-v3 (multilingual): best-in-class multilingual retrieval.
- Voyage-3-large: often tops benchmarks, especially for finance/legal.
For general prose, self-hosted
- BGE-M3: strong multilingual, unified dense/sparse/colbert, Apache-2.0 license
- E5-mistral-7b-instruct: top open-source quality, heavier inference
- nomic-embed-text-v1.5: efficient, strong, 8K context, Apache-2.0
- gte-large: efficient, decent quality, 512 context
For code
- Voyage-code-2 or Voyage-code-3: code-specific
- jina-embeddings-v2-base-code: open-source code embeddings
For legal/financial/medical
Evaluate domain-specific options first:
- Voyage-law-2 for legal
- Voyage-finance-2 for finance
- BioBERT / Clinical BERT for medical
If none fit, fine-tune a general model on domain data (see fine-tuning embeddings).
For multilingual
- Cohere embed-v3 multilingual (excellent quality)
- BGE-M3 (open-source, competitive)
- multilingual-e5-large
Closed vs open
See the dedicated page: closed vs open embedding models.
The testing protocol
- Build a small eval set: 30-100 queries with known-good answer chunks
- Index your corpus with candidate models
- Measure hit-rate@k, MRR, NDCG for each
- Compare alongside cost and latency
- Pick the one on the best quality/cost frontier for your use case
This is a one-afternoon exercise that saves months of subtle retrieval issues.
The lock-in question
Switching embedding models means reindexing your entire corpus. For large corpora, this is expensive and slow. Factors to consider:
- How often will you want to change models?
- Can you afford a migration window?
- Does your infra support running two embedding models in parallel during migration?
Build the infrastructure to switch embedding models without rebuilding everything else. Treat embedding_model_version as first-class metadata.
Next: Closed vs open embedding models.