Coding agent

Coding agents write code, run it, read the errors, fix, iterate. The test suite is the verifier. The compiler is the other verifier. The error trace is a rich signal that tells the agent exactly what broke. That tight feedback loop is why coding agents actually work today, well ahead of most other verticals. Claude Code, Cursor, Devin, Aider, they're all this same pattern tuned for different UIs.

The loop

The loop is the key. Each iteration, the verifier tells the agent exactly what's wrong. The agent reads the specific error and plans the fix. Loop until verify passes, then submit.

Core tools

Why this works (when it does)

Code has a deterministic verifier. The test either passes or fails. The compile either succeeds or gives a specific error. The lint either returns no issues or points at the line. That's rich, unambiguous feedback: exactly what a self-correcting agent needs. Few other domains have this, which is why coding agents are ahead of, say, marketing agents or HR agents, agents in those domains lack a cheap objective verifier.

Context management: the hard part

Codebases don't fit in context. A 10K-file repo would be millions of tokens. The agent has to navigate:

Agents that "read the whole module to understand it" burn context and make worse decisions. Precise reads beat thorough reads.

A worked example: fixing a failing test

  1. Task: "The test test_discount_calculation is failing. Fix it."
  2. Explore: agent runs the test → reads the error: AssertionError: expected 80.0, got 90.0.
  3. Search: search_code("def calculate_discount") → finds the function.
  4. Read: the function + the test file to understand intended behavior.
  5. Plan: the test expects 20% off; function applies 10%. Fix the rate.
  6. Edit: one-line change.
  7. Verify: run the test → passes. Run the full suite → all pass.
  8. Submit: show diff, ready to commit.

8 tool calls, about 30 seconds. The error message told the agent exactly what to fix. That's the feedback loop at work.

Testing is the gatekeeper

Coding-agent quality tracks test-suite quality directly:

Encourage users to invest in tests before deploying a coding agent. The ROI shows up immediately.

Safety

A coding agent can modify files, run commands, push to git. It can do real damage. Non-negotiable guardrails:

Failure modes

The ecosystem

What to do with this