What embeddings are
📖 5 min readUpdated 2026-04-18
An embedding is a vector of numbers that represents a piece of text. Two pieces of text with similar meaning produce vectors that point in similar directions. That's the entire idea. The rest is engineering: which model produces those vectors, how many dimensions, at what cost, for what content.
The geometry
Embedding models map text into a high-dimensional space (typically 768, 1024, 1536, or 3072 dimensions). In this space:
- "Reset my password" and "I forgot my login" are close (similar meaning)
- "Reset my password" and "What's the weather" are far apart
- Similarity is measured with cosine similarity (angle between vectors) or dot product
This geometry is how retrieval works: embed the user's query, find the nearest neighbors in your index, return those chunks.
How embedding models learn this
Embedding models are trained on massive pairs of related text, from semi-supervised scraping, from annotated datasets, or from contrastive learning objectives. The model is rewarded when its vectors for "similar" pairs are close and "dissimilar" pairs are far apart. Over billions of examples, it learns a space where semantic similarity maps to geometric proximity.
What embeddings are good at
- Synonyms and paraphrases ("car" ≈ "automobile")
- Cross-lingual (in multilingual models)
- Topic matching ("payment failed" ≈ chunks about billing issues)
- Abstract relationships ("Jeff Bezos" ≈ "Amazon founder")
What embeddings are bad at
- Exact string matching (product codes, UUIDs, specific part numbers)
- Negation ("does NOT include X" often embeds similarly to "includes X")
- Rare named entities not well-represented in training data
- Long documents with multiple topics (the average doesn't represent any one topic well)
- Fine distinctions in technical domains (two very similar-sounding API calls with different behavior)
These failure modes are why pure vector search underperforms. Hybrid search (dense + sparse) exists specifically to cover embedding weaknesses.
Dense vs sparse
- Dense embeddings (what most people mean): continuous vectors from a neural model. Good at semantic similarity, weak at exact matches.
- Sparse embeddings (BM25, SPLADE): high-dimensional vectors with mostly zero values, where each dimension corresponds to a term or learned token. Good at exact matches and keyword-heavy queries.
Modern RAG often combines both, see hybrid retrieval.
The two things that matter for RAG
- Model quality. How well the model captures the semantics of your domain.
- Cost. Per-million-token pricing for API models, inference cost for self-hosted.
Dimensions, latency, and context length matter too, but model quality dominates the retrieval ceiling.
What the numbers mean
Dimensions
How many numbers per vector. More dimensions = more storage, more compute. The model designers pick this, you don't tune it (except via Matryoshka, see dimensions and cost). Ranges from 384 (small models) to 3072+ (large).
Context length
Max tokens the model will embed at once. 512 is the old standard, 8192 is modern, and some models now go to 32K+. Longer context = can embed bigger chunks, but diminishing returns, an embedding is still one vector summarizing all tokens.
Similarity metric
- Cosine (angle-based): most common, scale-invariant
- Dot product: used when vectors are normalized (many modern models output pre-normalized vectors, so dot product = cosine)
- Euclidean: rare for text, more common in computer vision
Check which metric your model was trained with and use that one. Mismatched metrics degrade retrieval silently.
The mental model for debugging
When retrieval is failing:
- Look at the top-k results for a failing query
- Are they wrong? The embeddings don't understand your domain
- Are they right but buried at rank 5-10? The embeddings work, but you need reranking
- Are they missing entirely? The relevant chunk wasn't in your index (chunking or ingestion problem)
Next: Picking an embedding model.