Choosing a vector DB

The vector DB decision gets over-engineered. Most of the time the right answer is "the one your team can operate" plus "it supports hybrid search and metadata filtering." Everything else is second-order. Here's how I actually pick.

The decision matrix

Prototype / < 1M vectors / low traffic

Don't pay for a managed vector DB at this scale. It's wasted money.

Production, small to medium (1M-10M vectors)

Production, medium to large (10M-100M vectors)

Very large (100M+ vectors)

Already running Elasticsearch / OpenSearch

Use their vector support. The integration with your existing BM25 + filtering infrastructure is worth more than a slightly better dedicated vector DB.

The features that actually matter

Must-have

Nice-to-have

Often oversold

Cost comparison (2026 approximate)

For 10M vectors, 1024 dim, 100K queries/month:

Self-hosted wins on cost at scale. Managed wins on operational simplicity.

Lock-in considerations

Migrating vector DBs is moderately painful but not catastrophic:

Build an abstraction layer over your vector DB calls. Don't scatter provider-specific code across your app. A simple repository/interface pattern saves weeks when you eventually migrate.

The trap to avoid

Don't pick a vector DB based on a benchmark blog post. Every vendor publishes benchmarks that show them winning. The meaningful questions are:

The performance differences between top-tier vector DBs at reasonable scale are usually less than the differences between chunking strategies. Pick a reasonable DB and move on.

My current defaults

Next: Vector similarity search.