Debate is a multi-agent pattern where two or more agents argue opposing sides of a question. A judge (another agent or a human) evaluates. On many reasoning tasks, debate produces more accurate answers than a single agent.
A single agent can be wrong in the same direction its entire reasoning. Forcing it to construct opposing arguments surfaces flaws in its initial position. The judge, seeing both sides, often reaches a better answer than either debater alone.
Academic work has shown debate-based agents outperforming single agents on math, logic, and fact-checking benchmarks. The effect is strongest on problems where single-agent output is inconsistent.
Use debate sparingly, for high-stakes decisions or fact-checking. Every LLM call with debate costs 3-5x single-turn. But for tasks where accuracy matters more than cost, it's worth the premium.