Debate + consensus

Debate is a multi-agent pattern where two or more agents argue opposing sides of a question. A judge (another agent or a human) evaluates. On many reasoning tasks, debate produces more accurate answers than a single agent.

The basic setup

  1. Question is posed
  2. Agent A is assigned one position
  3. Agent B is assigned the opposite position (or "find the flaws" role)
  4. They exchange arguments for N rounds
  5. Judge reviews the transcript and decides or synthesizes

When debate helps

The quality hypothesis

A single agent can be wrong in the same direction its entire reasoning. Forcing it to construct opposing arguments surfaces flaws in its initial position. The judge, seeing both sides, often reaches a better answer than either debater alone.

Research findings

Academic work has shown debate-based agents outperforming single agents on math, logic, and fact-checking benchmarks. The effect is strongest on problems where single-agent output is inconsistent.

Failure modes

Practical adoption

Use debate sparingly, for high-stakes decisions or fact-checking. Every LLM call with debate costs 3-5x single-turn. But for tasks where accuracy matters more than cost, it's worth the premium.