Agent routing
📖 2 min readUpdated 2026-04-19
Routing is how a multi-agent system decides which agent answers each incoming request. It's a classifier that looks at the user's message and picks: billing agent, technical agent, generalist, human. Good routing is invisible. Bad routing sends users to the wrong specialist, who then has to hand off, so the user waits twice and everyone's annoyed.
Four approaches, each with tradeoffs
Rule-based: start here
Keyword or regex rules. "If message contains 'refund' or 'charge' or 'invoice' → billing agent." Takes 10 minutes to write. Runs in microseconds. Wrong 10-20% of the time on real user phrasings, but you know exactly when it's wrong and can add rules.
Use rules for the top 60% of traffic that's easy. Use a smarter mechanism for the rest.
LLM classifier: flexible, modest cost
A small, fast model reads the user's message and emits a category: {"route": "billing", "confidence": 0.82}. Handles novel phrasings and edge cases that rules miss. Costs one extra LLM call per request, usually under 200ms and <$0.001. Almost always worth it for user-facing routing.
Embedding-based: cheapest when traffic is high
Embed the user's message. Embed each agent's capability description. Cosine-similarity → top match. No LLM call per route, just a vector lookup. Very fast and near-free per request.
Good for high-volume systems where the LLM classifier latency or cost adds up. Less nuanced than an LLM classifier on weird requests.
Hybrid: what production usually ends up with
Rules for the clear cases, LLM classifier for everything else. The rules handle 70% of requests at zero latency. The LLM handles the 30% that's ambiguous, taking 200ms. Total average latency stays low, routing accuracy stays high. This is where most mature systems settle.
A worked example: a multi-agent support system
- Request arrives: "Can I get a refund for my last order? I'm also seeing a weird error when I try to log in."
- Rule-based first pass: matches both "refund" (billing) and "error" + "log in" (auth). Not a clean single match → fall through to LLM.
- LLM classifier: "This has two distinct requests: billing (refund) and auth (login issue). Route to billing first, flag auth as follow-up."
- Billing agent handles the refund. At the end, notices the auth follow-up and hands off to auth agent (who picks up from handoff payload).
Clean result, no user annoyance, two specialists collaborated via routing + handoff.
Always have a generalist fallback
Some requests don't match any specialist. "Hi, I have a question" → no category yet. Without a generalist, the router picks the least-wrong specialist and gets it wrong. With a generalist, unusual queries get a graceful "tell me more about what you need" agent that can then route to a specialist once the intent is clear.
Routing to multiple agents
When a request spans multiple domains:
- Pick a primary. Primary agent handles the main thread, calls specialists as tools for sub-tasks.
- Fan out. Route to several specialists in parallel, aggregate their responses. Works for "give me the full picture" queries.
- Sequential chain. First specialist handles part 1, hands to second for part 2. Works for "first do A, then do B" queries.
Measuring routing quality
Pitfalls
- No fallback. Unusual request → agent mismatched → bad UX. Always have a generalist.
- Over-specialization. 15 micro-specialists instead of 3 broad ones. Router accuracy collapses. Merge.
- Rule brittleness without monitoring. Rules drift from user phrasings over time. Measure and update.
- No confidence threshold. LLM classifier with 0.5 confidence on a routing call shouldn't be trusted; escalate to generalist.
- Silent mis-routes. User got routed wrong, agent failed silently. Log all handoffs + completions; investigate mis-match patterns.
What to do with this
- Start rule-based. Measure accuracy. Add LLM classifier once rules stop improving. Hybrid is where you want to end up.
- Read agent handoffs for what happens after routing (and what happens when routing was wrong).
- Read orchestrator-worker - your router is effectively a lightweight orchestrator.