What embeddings are

An embedding is a vector of numbers that represents a piece of text. Two pieces of text with similar meaning produce vectors that point in similar directions. That's the entire idea. The rest is engineering: which model produces those vectors, how many dimensions, at what cost, for what content.

The geometry

Embedding models map text into a high-dimensional space (typically 768, 1024, 1536, or 3072 dimensions). In this space:

This geometry is how retrieval works: embed the user's query, find the nearest neighbors in your index, return those chunks.

How embedding models learn this

Embedding models are trained on massive pairs of related text, from semi-supervised scraping, from annotated datasets, or from contrastive learning objectives. The model is rewarded when its vectors for "similar" pairs are close and "dissimilar" pairs are far apart. Over billions of examples, it learns a space where semantic similarity maps to geometric proximity.

What embeddings are good at

What embeddings are bad at

These failure modes are why pure vector search underperforms. Hybrid search (dense + sparse) exists specifically to cover embedding weaknesses.

Dense vs sparse

Modern RAG often combines both, see hybrid retrieval.

The two things that matter for RAG

  1. Model quality. How well the model captures the semantics of your domain.
  2. Cost. Per-million-token pricing for API models, inference cost for self-hosted.

Dimensions, latency, and context length matter too, but model quality dominates the retrieval ceiling.

What the numbers mean

Dimensions

How many numbers per vector. More dimensions = more storage, more compute. The model designers pick this, you don't tune it (except via Matryoshka, see dimensions and cost). Ranges from 384 (small models) to 3072+ (large).

Context length

Max tokens the model will embed at once. 512 is the old standard, 8192 is modern, and some models now go to 32K+. Longer context = can embed bigger chunks, but diminishing returns, an embedding is still one vector summarizing all tokens.

Similarity metric

Check which metric your model was trained with and use that one. Mismatched metrics degrade retrieval silently.

The mental model for debugging

When retrieval is failing:

  1. Look at the top-k results for a failing query
  2. Are they wrong? The embeddings don't understand your domain
  3. Are they right but buried at rank 5-10? The embeddings work, but you need reranking
  4. Are they missing entirely? The relevant chunk wasn't in your index (chunking or ingestion problem)

Next: Picking an embedding model.