Home›Expertise›AI Agents›Self-correction

Self-correction

📖 3 min readUpdated 2026-04-19

Self-correction is when the agent detects its own errors and fixes them without a human in the loop. It needs a verifier: something outside the model that can say "this is wrong, and here's why." Run code → fail test → fix code → pass test. Draft claim → fact-check against source → rewrite if unsupported. The pattern is simple but the power is enormous. It's the difference between an agent that gives up at the first stumble and one that actually gets things done.

The essential ingredient: a verifier

Self-correction is not the same as reflection. Reflection asks the model to critique itself (subjective). Self-correction runs an objective check: does the code compile? Do the tests pass? Is the claim backed by the source? Without a verifier outside the model, the model doesn't know it's wrong and can't correct.

Verifiers are the secret weapon of good agents. If you can find or build one for your task, you can self-correct. If you can't, you're stuck with reflection or human review.

The correction loop

The triggers

Tool returned an error. API 500, rate limit, bad response.
Test suite failed. The coding agent's daily bread.
Type check or compile error. Code wouldn't even run.
Claim unsupported. Agent wrote "X is true," verifier couldn't find X in the retrieved context.
Schema violation. Output didn't match expected shape.
External judge flagged it. LLM-as-judge or rule engine said no.

A worked example: coding agent

Produce: agent writes def divide(a, b): return a / b.
Verify: run the test suite. Test divide(10, 0) fails with ZeroDivisionError.
Agent reads error: "The test expected the function to return None when b is 0, but it raised an exception. I need to handle b == 0."
Fix: def divide(a, b): return None if b == 0 else a / b.
Verify again: all tests pass. Done.

This is why coding agents work: the test suite is a near-perfect verifier, and the error message is a rich signal that tells the agent exactly what to change. Take away the tests and you take away the feedback loop; the agent starts fumbling.

Self-correction by task type

Stop conditions

Self-correction can spiral. A model that can't fix a problem will iterate on the same wrong answer forever. Hard limits:

Max 3-5 correction attempts per issue.
If the new output is essentially the same as the old, stop, the model is stuck.
Cost budget across the whole correction sequence.
Escalate to a human or a stronger model after N failures.

The quality ceiling

Self-correction only works when the model is capable of producing the right answer but drifted from it. For problems beyond the model's capability (novel research, niche domain expertise the model doesn't have), self-correction doesn't help. The model just repeats variations of the wrong answer. You'll see this in traces: the test keeps failing, the fixes keep missing the real issue, and you're burning cost. Have a stop rule, escalate to a stronger model, or break the task into smaller pieces the current model can handle.

Pitfalls

No verifier, calling it self-correction anyway. If there's no external check, you have reflection, not self-correction.
Vague error messages. "Failed" doesn't help the model correct. "Expected list[int], got list[str] at index 3" does.
No cap. Model runs in circles, costs stack.
Same-wrong-answer loop. Detect it and halt. Not every problem yields to more iterations.
Hiding verifier failures from the model. Show the model the actual error text, not a paraphrase.

What to do with this

For your agent, ask: can I build a verifier? Tests, schemas, grounding checks, rule engines. Any of them unlocks self-correction.
Read tool error handling for the return-structured-errors pattern that makes this work.
Read human-in-the-loop for what happens when self-correction gives up.