Embedding

A way to turn text into numbers that represent its meaning, so computers can compare ideas.

Explained simply.

An embedding is a long list of numbers (usually 384 to 1,536 of them) that captures the meaning of a piece of text. The trick is: similar meanings get similar number lists. 'The cat sat on the mat' and 'A feline rested on the rug' will have embeddings that are very close together, even though they share almost no words. This is how a computer can 'know' two sentences mean the same thing.

An example.

You embed your 500 help-center articles. You embed the question 'how do I cancel?'. You measure distance between that question's embedding and every article's embedding. The 3 closest articles are probably the ones that answer the question, even if they don't use the word 'cancel' (they might say 'end subscription' or 'close account').

Why it matters.

Embeddings are the foundation of RAG, semantic search, classification, and clustering. Once you have good embeddings for your data, you can do things that keyword search can't.

Embedding

Explained simply.

An example.

Why it matters.

Related terms

Further reading