Home›Expertise›RAGS to Riches›What embeddings are

What embeddings are

📖 5 min readUpdated 2026-04-18

An embedding is a vector of numbers that represents a piece of text. Two pieces of text with similar meaning produce vectors that point in similar directions. That's the entire idea. The rest is engineering: which model produces those vectors, how many dimensions, at what cost, for what content.

The geometry

Embedding models map text into a high-dimensional space (typically 768, 1024, 1536, or 3072 dimensions). In this space:

"Reset my password" and "I forgot my login" are close (similar meaning)
"Reset my password" and "What's the weather" are far apart
Similarity is measured with cosine similarity (angle between vectors) or dot product

This geometry is how retrieval works: embed the user's query, find the nearest neighbors in your index, return those chunks.

How embedding models learn this

Embedding models are trained on massive pairs of related text, from semi-supervised scraping, from annotated datasets, or from contrastive learning objectives. The model is rewarded when its vectors for "similar" pairs are close and "dissimilar" pairs are far apart. Over billions of examples, it learns a space where semantic similarity maps to geometric proximity.

What embeddings are good at

Synonyms and paraphrases ("car" ≈ "automobile")
Cross-lingual (in multilingual models)
Topic matching ("payment failed" ≈ chunks about billing issues)
Abstract relationships ("Jeff Bezos" ≈ "Amazon founder")

What embeddings are bad at

Exact string matching (product codes, UUIDs, specific part numbers)
Negation ("does NOT include X" often embeds similarly to "includes X")
Rare named entities not well-represented in training data
Long documents with multiple topics (the average doesn't represent any one topic well)
Fine distinctions in technical domains (two very similar-sounding API calls with different behavior)

These failure modes are why pure vector search underperforms. Hybrid search (dense + sparse) exists specifically to cover embedding weaknesses.

Dense vs sparse

Dense embeddings (what most people mean): continuous vectors from a neural model. Good at semantic similarity, weak at exact matches.
Sparse embeddings (BM25, SPLADE): high-dimensional vectors with mostly zero values, where each dimension corresponds to a term or learned token. Good at exact matches and keyword-heavy queries.

Modern RAG often combines both, see hybrid retrieval.

The two things that matter for RAG

Model quality. How well the model captures the semantics of your domain.
Cost. Per-million-token pricing for API models, inference cost for self-hosted.

Dimensions, latency, and context length matter too, but model quality dominates the retrieval ceiling.

What the numbers mean

Dimensions

How many numbers per vector. More dimensions = more storage, more compute. The model designers pick this, you don't tune it (except via Matryoshka, see dimensions and cost). Ranges from 384 (small models) to 3072+ (large).

Context length

Max tokens the model will embed at once. 512 is the old standard, 8192 is modern, and some models now go to 32K+. Longer context = can embed bigger chunks, but diminishing returns, an embedding is still one vector summarizing all tokens.

Similarity metric

Cosine (angle-based): most common, scale-invariant
Dot product: used when vectors are normalized (many modern models output pre-normalized vectors, so dot product = cosine)
Euclidean: rare for text, more common in computer vision

Check which metric your model was trained with and use that one. Mismatched metrics degrade retrieval silently.

The mental model for debugging

When retrieval is failing:

Look at the top-k results for a failing query
Are they wrong? The embeddings don't understand your domain
Are they right but buried at rank 5-10? The embeddings work, but you need reranking
Are they missing entirely? The relevant chunk wasn't in your index (chunking or ingestion problem)

What to do with this

Check which similarity metric your model expects and use that one.
When retrieval fails, diagnose by looking at the top-k, not by swapping models randomly.
Read picking an embedding model next.