Coding agent

📖 3 min readUpdated 2026-04-19

Coding agents write code, run it, read the errors, fix, iterate. The test suite is the verifier. The compiler is the other verifier. The error trace is a rich signal that tells the agent exactly what broke. That tight feedback loop is why coding agents actually work today, well ahead of most other verticals. Claude Code, Cursor, Devin, Aider, they're all this same pattern tuned for different UIs.

The loop

The loop is the key. Each iteration, the verifier tells the agent exactly what's wrong. The agent reads the specific error and plans the fix. Loop until verify passes, then submit.

Core tools

read_file(path) - read source files.
write_file(path, contents) - overwrite.
edit_file(path, old, new) - targeted edit (safer than full overwrite).
search_code(query) - grep across the codebase.
list_files(dir) - navigate directories.
run_command(cmd) - tests, build, linter, formatter.
git_diff() - see what's been changed so far this session.

Why this works (when it does)

Code has a deterministic verifier. The test either passes or fails. The compile either succeeds or gives a specific error. The lint either returns no issues or points at the line. That's rich, unambiguous feedback: exactly what a self-correcting agent needs. Few other domains have this, which is why coding agents are ahead of, say, marketing agents or HR agents, agents in those domains lack a cheap objective verifier.

Context management: the hard part

Codebases don't fit in context. A 10K-file repo would be millions of tokens. The agent has to navigate:

Search first. Find the right files via search_code or directory listing before reading.
Load only relevant files. Not "read everything just in case."
Drop files from context once the edit is made and verified.
Summarize long files instead of loading the whole file when only one function matters.

Agents that "read the whole module to understand it" burn context and make worse decisions. Precise reads beat thorough reads.

A worked example: fixing a failing test

Task: "The test test_discount_calculation is failing. Fix it."
Explore: agent runs the test → reads the error: AssertionError: expected 80.0, got 90.0.
Search: search_code("def calculate_discount") → finds the function.
Read: the function + the test file to understand intended behavior.
Plan: the test expects 20% off; function applies 10%. Fix the rate.
Edit: one-line change.
Verify: run the test → passes. Run the full suite → all pass.
Submit: show diff, ready to commit.

8 tool calls, about 30 seconds. The error message told the agent exactly what to fix. That's the feedback loop at work.

Testing is the gatekeeper

Coding-agent quality tracks test-suite quality directly:

No tests → no verifier → agent guesses.
Weak tests → weak verifier → agent ships bugs that the tests don't catch.
Strong tests → strong verifier → agent produces reliable changes.

Encourage users to invest in tests before deploying a coding agent. The ROI shows up immediately.

Safety

A coding agent can modify files, run commands, push to git. It can do real damage. Non-negotiable guardrails:

Sandboxed execution. Run in a container or locked-down shell; don't let the agent touch things it shouldn't.
Filesystem restrictions. Scope to the repo; don't give it the whole home directory.
Human approval for destructive operations. git push, rm -rf, deploy commands, force-push.
No production secrets. Agent shouldn't have credentials it doesn't need.
Approval for dependency changes. New packages are supply-chain attack surface.

Failure modes

"Fixed" but not fixed. Test still fails, agent claims success. Always re-run tests after the agent says it's done.
Unrelated files edited. Agent gets "helpful" and refactors things you didn't ask for. Keep scope tight via prompt.
Stuck fixing the symptom. Tests keep failing the same way; agent keeps making small tweaks. Detect cycle, escalate or stop.
Context window collapse. Over-explored, lost track of what it was doing. Summarize aggressively.
Shipping without running tests. Always require verification before claiming done.

The ecosystem

Claude Code - CLI-based, terminal-native.
Cursor / Windsurf - IDE-integrated, inline.
Cognition Devin - more autonomous, session-based.
GitHub Copilot Workspace - task-level agent.
Aider, Continue, Cline - open-source.

What to do with this

Before you deploy a coding agent, audit your test suite. Weak tests = weak agent.
Start on a single narrow task (bug fix, lint fix) to learn the ergonomics before letting the agent touch broader work.
Read self-correction for the verifier-driven loop that powers these.