Most agent work is one agent doing one thing. But for some tasks, splitting the work across multiple agents - each with its own clean context and specialized role - is genuinely faster and more reliable than cramming everything into one giant agent. The trick is knowing which tasks those are. Multi-agent is fashionable; that's not the same as useful. This page is when it pays off, when it's just over-engineering, and how to do it right when you do need it.
One "orchestrator" agent decomposes a task into sub-tasks. Each sub-task gets handed to a "worker" agent with its own fresh context. Workers do their piece, return a result; the orchestrator collects all the results and synthesizes the final answer.
That's the shape. Everything else in multi-agent is variants of this.
Context isolation is the quiet big win. When one agent does three research topics, its context fills up with all three sets of scratch work - tool calls, partial results, abandoned searches. By the time it synthesizes, the signal-to-noise is poor. When three workers do one topic each, each worker's context stays clean, and the orchestrator only sees their final summaries. Much cleaner reasoning.
Parallelism is the obvious win. If three independent subtasks each take 3 minutes, running them in parallel takes 3 minutes instead of 9.
Specialization lets you use different models for different roles. Opus plans; Haiku workers execute. You get quality where it matters and speed/cost where it doesn't.
This is the part people learn the hard way:
Sub-agents don't have the orchestrator's memory. They only know what the orchestrator tells them. That means the parent has real responsibilities when spawning a child:
These aren't optional. Skipping any of them is how multi-agent systems end up being worse than single-agent: you've added all the coordination cost and gotten none of the cleanliness in return.
Claude Code has native support for sub-agents. You can spawn specialized ones like Explore (fast codebase search), Plan (architect implementation plans), or general-purpose (custom tasks), each running with fully isolated context and returning a single message back. For research-heavy or branching work, this is a big productivity unlock. The agent you're talking to doesn't get buried in scratch work; the sub-agent handles the mess and hands back a clean answer.
Two agents talking to each other directly, no orchestrator, is popular in demos. In production, it's almost always better re-modeled as orchestrator-worker, with one agent explicitly in charge. P2P setups tend to diverge, loop, or agree on wrong answers together. The clarity of having one agent that owns the outcome is worth the slight rigidity.
Andrej Karpathy - Intro to Large Language Models (1 hour)