Reflection

Reflection is the simplest quality boost you can add to an agent. The pattern: after the agent produces an answer, you run a second LLM call that critiques the answer. "Is this correct? Is anything missing? What would you change?" The agent reads the critique, revises, returns. That one extra step routinely buys 5-20% quality on hard tasks, at the cost of roughly doubling the spend. For anything where the answer matters more than the dollar, it's a cheap trade.

The pattern

  1. Draft: the agent produces whatever it would normally produce.
  2. Critique: a second LLM call gets the draft and a critique prompt. It lists problems.
  3. Revise: a third LLM call takes the draft + critique and produces v2.
  4. Stop (or loop 1-2 more times if the task is high-stakes).

Why this works

The model wrote the draft with one set of constraints in its head. When you ask it to critique, you're putting it in a different mental mode: adversarial, look-for-errors, grade-this. In that mode it catches things it missed the first time. It's the same trick editors use on their own writing: do a draft, take a break, re-read as a stranger.

A worked example: code generation

Draft:

def get_avg(nums):
    return sum(nums) / len(nums)

Critique prompt: "Review this function. List edge cases it doesn't handle."

Critique output: "Empty list → ZeroDivisionError. Non-numeric input → TypeError. No type hints or docstring."

Revise:

def get_avg(nums: list[float]) -> float:
    """Return the mean of a list of numbers. Returns 0.0 if the list is empty."""
    if not nums:
        return 0.0
    return sum(nums) / len(nums)

Same model, same problem. First draft is fragile. Second draft ships. The difference is the middle step.

When reflection pays off

Self vs separate critic

Two flavors of reflection:

For real production code or published writing, pay for the separate critic. For internal tools, self-reflection is usually enough.

The empirical gain

Published benchmarks show reflection adds 5-20% quality depending on the task. The gains concentrate in:

Stop conditions

Reflection can loop forever if you let it. Always cap:

Pitfalls

What to do with this