Self-correction

Self-correction is the agent detecting its own errors and fixing them without human intervention. Run a test, see it fail, fix the code, test again. Write a response, notice factual errors, rewrite. It's an underused pattern that dramatically improves reliability.

What triggers self-correction

The correction loop

  1. Agent produces output
  2. Verification step: is this output actually correct?
  3. If no: capture specific failure (error message, failed test, specific wrong claim)
  4. Agent reasons about the failure: "the test failed because X, so I need to change Y"
  5. Agent produces corrected output
  6. Verify again; iterate until pass or give up

Self-correction in coding

A coding agent writes code, runs tests, sees failures, reads error messages, fixes. This is why coding agents work: the test suite is the verifier, and the error trace is rich enough to act on.

Self-correction in prose

Harder. No automatic verifier for "is this essay good." Requires either reflection (see previous page), an LLM-as-judge, or user feedback.

Self-correction in research

Agent says "X is true." Verifier checks: is this supported by retrieved context? If no, agent must re-search or hedge the claim.

The stop condition

Self-correction can loop forever if the agent can't actually fix the problem. Hard limits:

The quality ceiling

Self-correction only works if the model is capable of producing a correct answer. Fundamentally hard problems (genuinely novel reasoning, specialized domain knowledge the model lacks) don't yield to self-correction. The model will iterate on the same wrong answer.