Multi-tenant RAG

Multi-tenant RAG is what B2B SaaS companies build: one RAG system serving many customer organizations, each with their own documents, users, and permissions. The requirements are strict, no data can cross tenant boundaries, even in retrieval candidates, and the scaling profile is different from single-tenant systems.

The core requirement

Absolute tenant isolation:

Isolation patterns

Pattern 1: Shared index, tenant_id filter

All tenants' vectors in one index. Every query filters by tenant_id.

Pros:

Cons:

Pattern 2: Namespace per tenant

Vector DBs like Pinecone offer namespaces. Each tenant gets its own namespace within a shared index. Queries are scoped to the namespace.

Pros:

Cons:

Pattern 3: Index per tenant

Each tenant has a dedicated index.

Pros:

Cons:

Pattern 4: Hybrid (tier by tenant size)

Small tenants share an index with filters. Large tenants get dedicated indexes or namespaces.

This is what most mature multi-tenant systems end up with. Optimizes cost at the long tail, gives isolation where it matters.

Permission handling within tenants

Even within one tenant, not all users see all documents:

Every chunk carries both tenant_id AND user/role-level permissions. Query filters apply both.

The filter architecture

query filter:
  tenant_id: [from authenticated user]
  permissions: [intersect with user's roles]
  optional: additional filters from query

Never trust client-provided tenant_id. Always derive
from authentication.

Embedding model choice

Multi-tenant systems usually use one embedding model across all tenants:

Per-tenant embeddings are exotic and usually not worth it. Tenant-specific fine-tuning is possible but rare.

LLM model choice

Common approach: offer tiers.

Let tenants choose their tier.

Scaling characteristics

Long-tail of small tenants

Typical B2B has many small tenants and few large ones. Shared infrastructure with filters serves the long tail cheaply.

Large tenants are outliers

A few enterprise tenants may have 100x the content of average. Separate resources for them prevents them from dominating shared infrastructure.

Per-tenant rate limits

Prevent one tenant from starving others. Rate limits at the tenant level in addition to per-user.

Onboarding new tenants

  1. Create tenant record in auth system
  2. Provision namespace (or confirm filter-based tenancy)
  3. Set up ingestion pipelines for tenant's sources
  4. Initial ingestion of existing content
  5. Configure tenant-specific settings (branding, limits, features)

This should be automated. Onboarding every tenant manually doesn't scale past a few dozen.

Offboarding (data deletion)

When a tenant leaves:

GDPR's "right to erasure" may apply. Build deletion pathways from day one.

Monitoring per tenant

Surfaces per-tenant issues before they escalate.

Tenant-specific customization

Some tenants want custom behavior:

Architect for this from the start. Per-tenant configuration files, stored customization, feature flags.

The security audit

For B2B RAG, regular audits confirm:

Third-party audits (SOC 2, ISO 27001) require these and will catch gaps.

The cost allocation

Per-tenant cost attribution is essential:

Feeds into pricing decisions and lets you identify unprofitable tenants.

Closing thought

Multi-tenant RAG has all the challenges of single-tenant RAG plus isolation, scaling, and operational multi-tenancy concerns. The isolation concerns aren't optional, one leak is a reputational catastrophe. Build with isolation as a first-class property, not an afterthought.

Back to the RAGS to Riches overview.