Home›Expertise›RAGS to Riches›Multi-tenant RAG

Multi-tenant RAG

📖 5 min readUpdated 2026-04-18

Multi-tenant RAG is what B2B SaaS companies build: one RAG system serving many customer organizations, each with their own documents, users, and permissions. The requirements are strict, no data can cross tenant boundaries, even in retrieval candidates, and the scaling profile is different from single-tenant systems.

The core requirement

Absolute tenant isolation:

Tenant A's users must never retrieve Tenant B's content
Even under bugs, infrastructure failures, or attacks
Data residency and compliance may require geographic separation
Some tenants may require full isolation (dedicated resources)

Isolation patterns

Pattern 1: Shared index, tenant_id filter

All tenants' vectors in one index. Every query filters by tenant_id.

Pros:

Simplest infrastructure
Lowest cost per tenant
Easy to scale horizontally

Cons:

Filter performance varies by selectivity
Bug in tenant_id filter = data leak
Harder to offer per-tenant customization

Pattern 2: Namespace per tenant

Vector DBs like Pinecone offer namespaces. Each tenant gets its own namespace within a shared index. Queries are scoped to the namespace.

Pros:

Physical separation at the namespace level
No filter overhead at query time
Easier to reason about isolation

Cons:

Per-namespace overhead (small but real)
Billing often per-namespace
Some operations (global re-indexing) become per-namespace

Pattern 3: Index per tenant

Each tenant has a dedicated index.

Pros:

Maximum isolation
Per-tenant tuning possible
Independent scaling

Cons:

Significant overhead at many tenants
Cost per tenant is higher
Operational complexity

Pattern 4: Hybrid (tier by tenant size)

Small tenants share an index with filters. Large tenants get dedicated indexes or namespaces.

This is what most mature multi-tenant systems end up with. Optimizes cost at the long tail, gives isolation where it matters.

Permission handling within tenants

Even within one tenant, not all users see all documents:

Department-level access
Project-level access
Individual document sharing

Every chunk carries both tenant_id AND user/role-level permissions. Query filters apply both.

The filter architecture

query filter:
  tenant_id: [from authenticated user]
  permissions: [intersect with user's roles]
  optional: additional filters from query

Never trust client-provided tenant_id. Always derive
from authentication.

Embedding model choice

Multi-tenant systems usually use one embedding model across all tenants:

Simpler operations
Allows cross-tenant analytics (in aggregate)
Consistent retrieval behavior

Per-tenant embeddings are exotic and usually not worth it. Tenant-specific fine-tuning is possible but rare.

LLM model choice

Common approach: offer tiers.

Basic tier: cheaper/faster model (GPT-4o-mini, Haiku)
Premium tier: flagship model (GPT-4o, Sonnet, Opus)
Enterprise tier: dedicated model deployment

Let tenants choose their tier.

Scaling characteristics

Long-tail of small tenants

Typical B2B has many small tenants and few large ones. Shared infrastructure with filters serves the long tail cheaply.

Large tenants are outliers

A few enterprise tenants may have 100x the content of average. Separate resources for them prevents them from dominating shared infrastructure.

Per-tenant rate limits

Prevent one tenant from starving others. Rate limits at the tenant level in addition to per-user.

Onboarding new tenants

Create tenant record in auth system
Provision namespace (or confirm filter-based tenancy)
Set up ingestion pipelines for tenant's sources
Initial ingestion of existing content
Configure tenant-specific settings (branding, limits, features)

This should be automated. Onboarding every tenant manually doesn't scale past a few dozen.

Offboarding (data deletion)

When a tenant leaves:

Delete all their vectors from the index
Delete source documents from any cache
Purge logs according to retention policy
Provide data export if contractually required

GDPR's "right to erasure" may apply. Build deletion pathways from day one.

Monitoring per tenant

Query volume
Quality metrics
Cost
Latency
Error rates

Surfaces per-tenant issues before they escalate.

Tenant-specific customization

Some tenants want custom behavior:

Their own system prompt / persona
Specific document sources enabled/disabled
Custom metadata filters
Different retrieval parameters
White-label branding

Architect for this from the start. Per-tenant configuration files, stored customization, feature flags.

The security audit

For B2B RAG, regular audits confirm:

Cross-tenant leaks are not possible
Access controls are enforced at retrieval
Authentication flows are correctly attributed
Logs are sufficient for incident investigation

Third-party audits (SOC 2, ISO 27001) require these and will catch gaps.

The cost allocation

Per-tenant cost attribution is essential:

Embedding cost for ingestion (attribute to tenant)
Vector DB storage (per-tenant or per-namespace)
Query costs (per query, per tenant)
LLM costs per query, per tenant

Feeds into pricing decisions and lets you identify unprofitable tenants.

Closing thought

Multi-tenant RAG has all the challenges of single-tenant RAG plus isolation, scaling, and operational multi-tenancy concerns. The isolation concerns aren't optional, one leak is a reputational catastrophe. Build with isolation as a first-class property, not an afterthought.

What to do with this

Derive tenant_id from the authenticated session on every request, never trust a client-supplied tenant identifier
Start with shared-index + namespace filters for the long tail, graduate enterprise tenants to dedicated namespaces or indexes as they grow
Attach per-user permission tags to every chunk, apply both tenant and permission filters at retrieval time
Build automated tenant onboarding and offboarding flows from day one, including GDPR deletion paths
Track per-tenant cost, query volume, and latency so you can identify unprofitable tenants and scaling hot spots before they break production
Write an isolation test suite that actively tries to leak data across tenants, run it in CI

Back to the RAGS to Riches overview.

Multi-tenant RAG

The core requirement

Isolation patterns

Pattern 1: Shared index, tenant_id filter

Pattern 2: Namespace per tenant

Pattern 3: Index per tenant

Pattern 4: Hybrid (tier by tenant size)

Permission handling within tenants

The filter architecture

Embedding model choice

LLM model choice

Scaling characteristics

Long-tail of small tenants

Large tenants are outliers

Per-tenant rate limits

Onboarding new tenants

Offboarding (data deletion)

Monitoring per tenant

Tenant-specific customization

The security audit

The cost allocation

Closing thought

What to do with this

Further reading