Embeddings

An embedding is a list of numbers that represents the meaning of a piece of text, image, or audio so similar things cluster together in vector space.

An embedding is a fixed-length list of numbers (typically 512-4096 floats) that represents the semantic content of a piece of input. The trick is that the embedding is learned: similar inputs produce similar number lists, dissimilar inputs produce different ones.

If you embed the sentences "the cat sat on the mat" and "a feline rested on the rug", the resulting vectors will be very close together even though they share almost no exact words. If you embed "the cat sat on the mat" and "quantum field theory", the vectors will be far apart. That property — meaning becomes geometry — is what makes embeddings useful.

Where embeddings show up:

Semantic search — find documents that match a query even when the wording is different.
RAG pipelines — retrieve relevant context before asking an LLM to generate.
Clustering — group similar items (de-duplication, topic modeling, news clustering).
Recommendation — find products/songs/articles similar to one a user liked.
Classification — train a small model on embeddings instead of raw text; faster + cheaper.

The leading embedding APIs in 2026 are OpenAI's text-embedding-3-large, Voyage AI's voyage-3, and Cohere's embed-v3. Open-source leaders include nomic-embed-text and bge-large. Choice matters: a better embedding model can lift your retrieval quality more than any other single change.

A subtle gotcha: embeddings from different models are not interchangeable. If you embed your corpus with model A and your queries with model B, the geometry won't line up and recall will collapse. Always use the same model for both — and if you ever upgrade, re-embed the entire corpus.

FAQ

What does 'high-dimensional' mean for embeddings?

Just that the vector has many components — usually 768, 1024, 1536, or 3072 floats. Higher dimensions can capture finer distinctions but cost more to store and compare. Most teams happily run at 1024-1536.

Related terms

Vector database — A vector database stores numerical embeddings of text/images/audio and finds similar items by distance, powering semantic search and RAG.
RAG (Retrieval-Augmented Generation) — RAG combines a language model with a search step over your own documents, so answers stay grounded in your data instead of hallucinating.
Tokenization — Tokenization is the process of breaking text into chunks (tokens) — usually sub-word pieces — that an LLM actually reads and writes.

Want to actually build with this?

Our Stack Builder picks the best AI tools for your specific project in under 60 seconds.

Build my stack →