Agent routing

Routing is how a multi-agent system decides which agent answers each incoming request. It's a classifier that looks at the user's message and picks: billing agent, technical agent, generalist, human. Good routing is invisible. Bad routing sends users to the wrong specialist, who then has to hand off, so the user waits twice and everyone's annoyed.

Four approaches, each with tradeoffs

Rule-based: start here

Keyword or regex rules. "If message contains 'refund' or 'charge' or 'invoice' → billing agent." Takes 10 minutes to write. Runs in microseconds. Wrong 10-20% of the time on real user phrasings, but you know exactly when it's wrong and can add rules.

Use rules for the top 60% of traffic that's easy. Use a smarter mechanism for the rest.

LLM classifier: flexible, modest cost

A small, fast model reads the user's message and emits a category: {"route": "billing", "confidence": 0.82}. Handles novel phrasings and edge cases that rules miss. Costs one extra LLM call per request, usually under 200ms and <$0.001. Almost always worth it for user-facing routing.

Embedding-based: cheapest when traffic is high

Embed the user's message. Embed each agent's capability description. Cosine-similarity → top match. No LLM call per route, just a vector lookup. Very fast and near-free per request.

Good for high-volume systems where the LLM classifier latency or cost adds up. Less nuanced than an LLM classifier on weird requests.

Hybrid: what production usually ends up with

Rules for the clear cases, LLM classifier for everything else. The rules handle 70% of requests at zero latency. The LLM handles the 30% that's ambiguous, taking 200ms. Total average latency stays low, routing accuracy stays high. This is where most mature systems settle.

A worked example: a multi-agent support system

  1. Request arrives: "Can I get a refund for my last order? I'm also seeing a weird error when I try to log in."
  2. Rule-based first pass: matches both "refund" (billing) and "error" + "log in" (auth). Not a clean single match → fall through to LLM.
  3. LLM classifier: "This has two distinct requests: billing (refund) and auth (login issue). Route to billing first, flag auth as follow-up."
  4. Billing agent handles the refund. At the end, notices the auth follow-up and hands off to auth agent (who picks up from handoff payload).

Clean result, no user annoyance, two specialists collaborated via routing + handoff.

Always have a generalist fallback

Some requests don't match any specialist. "Hi, I have a question" → no category yet. Without a generalist, the router picks the least-wrong specialist and gets it wrong. With a generalist, unusual queries get a graceful "tell me more about what you need" agent that can then route to a specialist once the intent is clear.

Routing to multiple agents

When a request spans multiple domains:

Measuring routing quality

Pitfalls

What to do with this