Coding agents write code, run it, read errors, fix, iterate. The test suite is the verifier; the error trace is the feedback. Claude Code, Cursor, and tools like Devin are built on this pattern.
read_file(path)write_file(path, contents)run_command(cmd), for tests, build, linterssearch_code(query)list_files(dir)Code has a natural verifier: compilers, test suites, linters. When the agent writes bad code, the feedback is unambiguous. The LLM uses the error message to plan the fix. Few domains have this tight feedback loop, it's why coding agents are ahead of other verticals.
Codebases are too big to fit in context. Agent must:
Agent quality tracks test suite quality. Without tests, agent has no feedback. Encourage users to have good tests before deploying a coding agent.
Coding agents can do damage. Guardrails: