Embeddings

Numerical vector representations of text (or other content) where semantic similarity maps to vector distance, enabling search by meaning rather than keyword.

Embeddings are how machines represent meaning numerically. Each piece of text is converted to a fixed-length vector (typically 768, 1024, 1536, or 3072 dimensions in 2026) such that semantically similar texts produce vectors that are close in the vector space.

For agents, embeddings are the foundation of RAG, agent memory, and semantic search tools. Common providers:

OpenAI text-embedding-3 — 1536 or 3072 dims, strong default
Cohere embed-v4 — multilingual, strong English performance
Voyage AI — specialized for retrieval, often outperforms general-purpose embeddings on benchmarks
Open-weight — BGE, E5, nomic-embed — competitive and self-hostable

Embedding quality matters more than vector DB choice for retrieval quality. The dominant failure mode in production RAG isn’t slow searches; it’s that the embedding model didn’t understand the query’s meaning the way a human would.

Cost note: embedding generation is one-time per document but inference happens at query time. For long-lived corpora, embedding cost is small. For frequently-changing data, it can dominate — re-embed strategy is a real architectural decision.

Embeddings can also be computed for images, audio, or multimodal content. Multimodal RAG (image search, video search) uses the same vector-database infrastructure with different embedding models.