Every vector DB choice involves managed vs self-hosted, scale, features, and lock-in. Here's my decision framework and what I actually pick for different projects.
Vector storage and retrieval can get expensive at scale. Here are the levers I pull to reduce cost without killing quality.
Dense vectors find semantic matches. Sparse methods (BM25) find keyword matches. Hybrid search combines them and usually beats either alone.
Approximate nearest neighbor algorithms trade exact results for speed. Here's how HNSW, IVF, and PQ work, and when to use which.
Metadata filtering is what lets you do multi-tenant RAG, access control, freshness, and narrow-scope queries. The performance model matters.
Vector databases are infrastructure for nearest-neighbor search at scale. Here's what they do, how they differ from traditional databases, and the major players.