Vector embeddings turn text into high-dimensional numeric vectors where semantic similarity becomes geometric proximity. This enables modern semantic search, RAG, and matching systems to work even when queries and documents use different wording.
Core ideas:
- Embeddings as vectors:
- Text → list of floats (e.g. 1,536-dim vector from
text-embedding-3-small). - Similar meaning → vectors close together (high cosine similarity / small angle).
- Different meaning → vectors far apart.
- Text → list of floats (e.g. 1,536-dim vector from
- Why embeddings beat pure keyword search:
- Keyword/BM25 rely on exact tokens and miss synonyms, paraphrases, and spelling variants.
- Embeddings capture semantics, so “cancel subscription” ≈ “unsubscribe” ≈ “stop my plan”.
- Common embedding models:
text-embedding-3-small(OpenAI, 1,536 dims): best cost/performance for most.text-embedding-3-large(OpenAI, 3,072 dims): more accurate, ~5× cost.embed-english-v3.0(Cohere, 1,024 dims): strong multilingual.all-MiniLM-L6-v2(HF, 384 dims): free, local, great for prototyping.mxbai-embed-large(Mixedbread, 1,024 dims): strong open-source.
More dimensions → more nuance but higher storage and latency. For most business use cases, text-embedding-3-small is a solid default.
- Vector databases & ANN search:
- Need efficient nearest-neighbour search over millions of vectors.
- Use ANN indexes (e.g. HNSW, IVF-Flat) to get millisecond queries.
- Typical choices:
- Pinecone: fully managed, fast to start.
- Weaviate: open-source + managed, strong hybrid search.
- Qdrant: open-source, performant, good for self-hosting.
- pgvector: Postgres extension, great if you’re already on Postgres.
- ChromaDB: simple local dev store.
- Chunking (critical for retrieval quality):
- Don’t embed whole long docs as a single vector.
- Split into chunks (≈256–1,024 tokens) and embed each.
- Practical strategies:
- Fixed-size with overlap: N tokens with 10–20% overlap.
- Semantic chunking: split on paragraphs/sections.
- Hierarchical / parent-child: small chunks for retrieval + larger parent for context.
- Common failure mode: chunks too large (e.g. 2,000 tokens) → relevant info buried in noise.
- Similarity metrics:
- Cosine similarity: angle only; standard for text embeddings.
- Dot product: fast, but magnitude-sensitive; OK if vectors are normalized.
- Euclidean distance: less effective in high-dimensional text spaces.
- In practice, use cosine unless your vector DB recommends otherwise for a specific index.
- Hybrid search (vector + keyword):
- Pure vector search can underperform on exact identifiers (SKUs, codes, names).
- Combine BM25 keyword search with vector search and merge via RRF (Reciprocal Rank Fusion).
- Often best-performing setup for enterprise search.
- Weaviate supports hybrid natively; Pinecone/pgvector typically combine in app code or via frameworks (e.g. LlamaIndex).
- RAG pipeline overview:
- Ingest:
- Chunk documents.
- Embed each chunk.
- Store vectors + metadata in a vector DB.
- Ingest: