Self-correction is the agent detecting its own errors and fixing them without human intervention. Run a test, see it fail, fix the code, test again. Write a response, notice factual errors, rewrite. It's an underused pattern that dramatically improves reliability.
A coding agent writes code, runs tests, sees failures, reads error messages, fixes. This is why coding agents work: the test suite is the verifier, and the error trace is rich enough to act on.
Harder. No automatic verifier for "is this essay good." Requires either reflection (see previous page), an LLM-as-judge, or user feedback.
Agent says "X is true." Verifier checks: is this supported by retrieved context? If no, agent must re-search or hedge the claim.
Self-correction can loop forever if the agent can't actually fix the problem. Hard limits:
Self-correction only works if the model is capable of producing a correct answer. Fundamentally hard problems (genuinely novel reasoning, specialized domain knowledge the model lacks) don't yield to self-correction. The model will iterate on the same wrong answer.