Self-monitoring
📖 5 min readUpdated 2026-04-18
An autonomous agent is running without you. How do you know it's still working? The answer: the agent tells you. Self-monitoring is what separates "running" from "reliable."
What to monitor
- Is it running? Heartbeat pings from each run
- Is it finishing? Duration per run, if it drifts, something's wrong
- Is it succeeding? Per-run success/failure flag
- Is it doing quality work? Output quality score (eval every Nth run)
- Is it spending too much? Tokens + $ per run
- Is it safe? Tool calls vs deny list; any flagged outputs
Three layers of monitoring
1. Liveness
Does the agent run at all? Use a dead-man's switch, every run pings a service. If pings stop, you get alerted.
# At end of successful run
curl https://hc-ping.com/your-check-uuid
Services: Healthchecks.io, BetterUptime, Dead Man's Snitch.
2. Correctness
Did the agent do what it was supposed to? Define a correctness check. For a reporting agent: "Report was generated and posted to #reports channel." Run the check after each run. Log result.
3. Quality
Is the output actually good, or is the agent producing garbage? This is harder. Run a lightweight eval on a sample of outputs (every 10th run, or random sampling). Use LLM-as-judge with a rubric.
Self-reporting patterns
Daily digest
End of each day, the agent summarizes what it did, flagged issues, and notable outputs. Sent to you via email or Slack.
Anomaly flagging
If something is far from baseline (3× longer run time, 5× more tool calls, output size way off), the agent stops and notifies you.
Confidence scores
Agent rates its own confidence per task. Low-confidence outputs get flagged for human review instead of shipping.
When to auto-restart
- Transient failures (network, rate limits), yes, with exponential backoff, max 3 retries
- Tool errors, depends. Retry once with different args; then escalate.
- Model errors (overload, 500s), retry with backoff; fall back to cheaper model if needed
- Policy violations (denied tool), don't retry. Log and escalate.
Escalation ladder
When the agent can't proceed, what happens?
- Retry, same task, slight variation
- Fallback, simpler version of the task
- Pause, stop this run, wait for next cycle
- Notify, send a message to human; include context
- Disable, turn off future runs until human resets
Useful signals to track per-run
- Start time + duration
- Turn count
- Tokens in / out / cached
- Cost in $
- Tool call count (per tool)
- Errors encountered
- Output size
- Exit reason: completed / max-turns / error / escalated
Stored as JSON lines in a log file, you can grep for patterns after the fact.
The agent is a system. Treat its observability like you would a production service. Latency. Error rate. Cost. Quality. You need all four.