Metadata filtering

Metadata filtering combines vector similarity with structured predicates: "find chunks similar to this query, but only from documents owned by tenant X, published after date Y, with visibility = public." Every production RAG system needs this. The performance characteristics differ dramatically between vector databases.

The query shape

query: vector(embedding of user question)
filter: {
  tenant_id: "acme",
  visibility: {$in: ["public", "internal"]},
  updated_at: {$gte: "2024-01-01"},
  document_type: "policy"
}
top_k: 10

The database's job: find the 10 nearest neighbors that also satisfy all filter conditions. Sounds simple, isn't.

Three strategies

Pre-filter

Narrow the candidate set to matching metadata first, then search only those. Exact: returns only filter-matching documents.

Post-filter

Do vector search first over all documents, then filter. Fast search, but may return fewer-than-k if filter is selective.

Dynamic / hybrid

The database decides based on estimated filter selectivity. Modern vector DBs (Pinecone, Qdrant, Weaviate) do this automatically.

The performance failure mode

A query with a highly selective filter on a post-filter-only database:

You can increase the search k to compensate (over-fetch), but this is wasteful and still unreliable when filters are very selective.

The right answer: a vector DB that supports pre-filtering or dynamic filtering. Or query the metadata first to get document IDs, then run a restricted vector search.

Vector DB comparison on filtering

Index the metadata fields you'll filter on

Every vector DB lets you index specific metadata fields for fast filtering. Index every field that appears in common queries. Un-indexed fields force a full scan per query.

Typical fields to index:

The multi-tenant special case

In multi-tenant RAG, tenant_id filter runs on every query. Options:

Single index, filter on tenant_id

One physical index, logical separation via filter. Simple. Performance depends on tenant distribution.

Namespace per tenant (Pinecone pattern)

Each tenant has its own namespace. Physical separation. No risk of cross-tenant leaks. Better performance for tenant-scoped queries.

Collection per tenant

One collection per tenant. Extreme isolation. Expensive in overhead if you have many small tenants.

See multi-tenant RAG for more detail.

Common filter patterns

Access control

permissions: {$in: [user.roles]}

Freshness boost

Filters to published_after: (today - 2 years). Older documents are still searchable if you remove the filter for broad queries.

Document type scoping

User asks "what's our refund policy", filter to document_type = "policy".

Tenant + source

tenant_id: "acme" AND source_system: {$in: ["confluence", "drive"]}

Filter-aware reranking

After retrieval, you can apply soft filters (boosts rather than hard filters) via the reranker:

See reranking.

Next: Cost optimization.