Multi-agent orchestration

Most agent work is one agent doing one thing. But for some tasks, splitting the work across multiple agents - each with its own clean context and specialized role - is genuinely faster and more reliable than cramming everything into one giant agent. The trick is knowing which tasks those are. Multi-agent is fashionable; that's not the same as useful. This page is when it pays off, when it's just over-engineering, and how to do it right when you do need it.

The canonical pattern: orchestrator-worker.

One "orchestrator" agent decomposes a task into sub-tasks. Each sub-task gets handed to a "worker" agent with its own fresh context. Workers do their piece, return a result; the orchestrator collects all the results and synthesizes the final answer.

~ orchestrator-worker pattern ~

That's the shape. Everything else in multi-agent is variants of this.

Why multi-agent helps (when it does).

~ what multi-agent buys you ~

Context isolation is the quiet big win. When one agent does three research topics, its context fills up with all three sets of scratch work - tool calls, partial results, abandoned searches. By the time it synthesizes, the signal-to-noise is poor. When three workers do one topic each, each worker's context stays clean, and the orchestrator only sees their final summaries. Much cleaner reasoning.

Parallelism is the obvious win. If three independent subtasks each take 3 minutes, running them in parallel takes 3 minutes instead of 9.

Specialization lets you use different models for different roles. Opus plans; Haiku workers execute. You get quality where it matters and speed/cost where it doesn't.

Where multi-agent backfires.

This is the part people learn the hard way:

When to use multi-agent, when not to.

~ good fit vs bad fit ~

Delegation discipline. Writing a good sub-agent prompt.

Sub-agents don't have the orchestrator's memory. They only know what the orchestrator tells them. That means the parent has real responsibilities when spawning a child:

  1. Write a complete, self-contained task description. Assume the worker is starting from zero - because it is. Include the why, not just the what.
  2. Specify the return format. Unstructured worker output is brutal to consume. "Return as: a 3-bullet summary with a final line 'CONFIDENCE: high/medium/low'."
  3. Set a budget. Max turns, max tokens, max time. Workers can spin forever if you let them.
  4. Handle worker failure. Assume it WILL fail. Decide in advance whether to retry, proceed without, or escalate.

These aren't optional. Skipping any of them is how multi-agent systems end up being worse than single-agent: you've added all the coordination cost and gotten none of the cleanliness in return.

Claude Code's sub-agents.

Claude Code has native support for sub-agents. You can spawn specialized ones like Explore (fast codebase search), Plan (architect implementation plans), or general-purpose (custom tasks), each running with fully isolated context and returning a single message back. For research-heavy or branching work, this is a big productivity unlock. The agent you're talking to doesn't get buried in scratch work; the sub-agent handles the mess and hands back a clean answer.

Peer-to-peer agents. Usually a trap.

Two agents talking to each other directly, no orchestrator, is popular in demos. In production, it's almost always better re-modeled as orchestrator-worker, with one agent explicitly in charge. P2P setups tend to diverge, loop, or agree on wrong answers together. The clarity of having one agent that owns the outcome is worth the slight rigidity.

The fashion warning: multi-agent architectures are trendy. They look impressive on a slide. But the single-agent ReAct baseline is almost always faster to build, cheaper to run, and easier to debug. Start single-agent. Graduate to multi-agent only when you have a specific bottleneck that multi-agent genuinely solves.