Multi-tenant RAG
📖 5 min readUpdated 2026-04-18
Multi-tenant RAG is what B2B SaaS companies build: one RAG system serving many customer organizations, each with their own documents, users, and permissions. The requirements are strict, no data can cross tenant boundaries, even in retrieval candidates, and the scaling profile is different from single-tenant systems.
The core requirement
Absolute tenant isolation:
- Tenant A's users must never retrieve Tenant B's content
- Even under bugs, infrastructure failures, or attacks
- Data residency and compliance may require geographic separation
- Some tenants may require full isolation (dedicated resources)
Isolation patterns
Pattern 1: Shared index, tenant_id filter
All tenants' vectors in one index. Every query filters by tenant_id.
Pros:
- Simplest infrastructure
- Lowest cost per tenant
- Easy to scale horizontally
Cons:
- Filter performance varies by selectivity
- Bug in tenant_id filter = data leak
- Harder to offer per-tenant customization
Pattern 2: Namespace per tenant
Vector DBs like Pinecone offer namespaces. Each tenant gets its own namespace within a shared index. Queries are scoped to the namespace.
Pros:
- Physical separation at the namespace level
- No filter overhead at query time
- Easier to reason about isolation
Cons:
- Per-namespace overhead (small but real)
- Billing often per-namespace
- Some operations (global re-indexing) become per-namespace
Pattern 3: Index per tenant
Each tenant has a dedicated index.
Pros:
- Maximum isolation
- Per-tenant tuning possible
- Independent scaling
Cons:
- Significant overhead at many tenants
- Cost per tenant is higher
- Operational complexity
Pattern 4: Hybrid (tier by tenant size)
Small tenants share an index with filters. Large tenants get dedicated indexes or namespaces.
This is what most mature multi-tenant systems end up with. Optimizes cost at the long tail, gives isolation where it matters.
Permission handling within tenants
Even within one tenant, not all users see all documents:
- Department-level access
- Project-level access
- Individual document sharing
Every chunk carries both tenant_id AND user/role-level permissions. Query filters apply both.
The filter architecture
query filter:
tenant_id: [from authenticated user]
permissions: [intersect with user's roles]
optional: additional filters from query
Never trust client-provided tenant_id. Always derive
from authentication.
Embedding model choice
Multi-tenant systems usually use one embedding model across all tenants:
- Simpler operations
- Allows cross-tenant analytics (in aggregate)
- Consistent retrieval behavior
Per-tenant embeddings are exotic and usually not worth it. Tenant-specific fine-tuning is possible but rare.
LLM model choice
Common approach: offer tiers.
- Basic tier: cheaper/faster model (GPT-4o-mini, Haiku)
- Premium tier: flagship model (GPT-4o, Sonnet, Opus)
- Enterprise tier: dedicated model deployment
Let tenants choose their tier.
Scaling characteristics
Long-tail of small tenants
Typical B2B has many small tenants and few large ones. Shared infrastructure with filters serves the long tail cheaply.
Large tenants are outliers
A few enterprise tenants may have 100x the content of average. Separate resources for them prevents them from dominating shared infrastructure.
Per-tenant rate limits
Prevent one tenant from starving others. Rate limits at the tenant level in addition to per-user.
Onboarding new tenants
- Create tenant record in auth system
- Provision namespace (or confirm filter-based tenancy)
- Set up ingestion pipelines for tenant's sources
- Initial ingestion of existing content
- Configure tenant-specific settings (branding, limits, features)
This should be automated. Onboarding every tenant manually doesn't scale past a few dozen.
Offboarding (data deletion)
When a tenant leaves:
- Delete all their vectors from the index
- Delete source documents from any cache
- Purge logs according to retention policy
- Provide data export if contractually required
GDPR's "right to erasure" may apply. Build deletion pathways from day one.
Monitoring per tenant
- Query volume
- Quality metrics
- Cost
- Latency
- Error rates
Surfaces per-tenant issues before they escalate.
Tenant-specific customization
Some tenants want custom behavior:
- Their own system prompt / persona
- Specific document sources enabled/disabled
- Custom metadata filters
- Different retrieval parameters
- White-label branding
Architect for this from the start. Per-tenant configuration files, stored customization, feature flags.
The security audit
For B2B RAG, regular audits confirm:
- Cross-tenant leaks are not possible
- Access controls are enforced at retrieval
- Authentication flows are correctly attributed
- Logs are sufficient for incident investigation
Third-party audits (SOC 2, ISO 27001) require these and will catch gaps.
The cost allocation
Per-tenant cost attribution is essential:
- Embedding cost for ingestion (attribute to tenant)
- Vector DB storage (per-tenant or per-namespace)
- Query costs (per query, per tenant)
- LLM costs per query, per tenant
Feeds into pricing decisions and lets you identify unprofitable tenants.
Closing thought
Multi-tenant RAG has all the challenges of single-tenant RAG plus isolation, scaling, and operational multi-tenancy concerns. The isolation concerns aren't optional, one leak is a reputational catastrophe. Build with isolation as a first-class property, not an afterthought.
What to do with this
- Derive tenant_id from the authenticated session on every request, never trust a client-supplied tenant identifier
- Start with shared-index + namespace filters for the long tail, graduate enterprise tenants to dedicated namespaces or indexes as they grow
- Attach per-user permission tags to every chunk, apply both tenant and permission filters at retrieval time
- Build automated tenant onboarding and offboarding flows from day one, including GDPR deletion paths
- Track per-tenant cost, query volume, and latency so you can identify unprofitable tenants and scaling hot spots before they break production
- Write an isolation test suite that actively tries to leak data across tenants, run it in CI
Back to the RAGS to Riches overview.