Metadata filtering combines vector similarity with structured predicates: "find chunks similar to this query, but only from documents owned by tenant X, published after date Y, with visibility = public." Every production RAG system needs this. The performance characteristics differ dramatically between vector databases.
query: vector(embedding of user question)
filter: {
tenant_id: "acme",
visibility: {$in: ["public", "internal"]},
updated_at: {$gte: "2024-01-01"},
document_type: "policy"
}
top_k: 10
The database's job: find the 10 nearest neighbors that also satisfy all filter conditions. Sounds simple, isn't.
Narrow the candidate set to matching metadata first, then search only those. Exact: returns only filter-matching documents.
Do vector search first over all documents, then filter. Fast search, but may return fewer-than-k if filter is selective.
The database decides based on estimated filter selectivity. Modern vector DBs (Pinecone, Qdrant, Weaviate) do this automatically.
A query with a highly selective filter on a post-filter-only database:
You can increase the search k to compensate (over-fetch), but this is wasteful and still unreliable when filters are very selective.
The right answer: a vector DB that supports pre-filtering or dynamic filtering. Or query the metadata first to get document IDs, then run a restricted vector search.
Every vector DB lets you index specific metadata fields for fast filtering. Index every field that appears in common queries. Un-indexed fields force a full scan per query.
Typical fields to index:
tenant_idsource_systemdocument_typevisibility or permissionsIn multi-tenant RAG, tenant_id filter runs on every query. Options:
One physical index, logical separation via filter. Simple. Performance depends on tenant distribution.
Each tenant has its own namespace. Physical separation. No risk of cross-tenant leaks. Better performance for tenant-scoped queries.
One collection per tenant. Extreme isolation. Expensive in overhead if you have many small tenants.
See multi-tenant RAG for more detail.
permissions: {$in: [user.roles]}
Filters to published_after: (today - 2 years). Older documents are still searchable if you remove the filter for broad queries.
User asks "what's our refund policy", filter to document_type = "policy".
tenant_id: "acme" AND source_system: {$in: ["confluence", "drive"]}
After retrieval, you can apply soft filters (boosts rather than hard filters) via the reranker:
See reranking.
Next: Cost optimization.