Self-monitoring

An autonomous agent is running without you. How do you know it's still working? The answer: the agent tells you. Self-monitoring is what separates "running" from "reliable."

What to monitor

Three layers of monitoring

1. Liveness

Does the agent run at all? Use a dead-man's switch, every run pings a service. If pings stop, you get alerted.

# At end of successful run
curl https://hc-ping.com/your-check-uuid

Services: Healthchecks.io, BetterUptime, Dead Man's Snitch.

2. Correctness

Did the agent do what it was supposed to? Define a correctness check. For a reporting agent: "Report was generated and posted to #reports channel." Run the check after each run. Log result.

3. Quality

Is the output actually good, or is the agent producing garbage? This is harder. Run a lightweight eval on a sample of outputs (every 10th run, or random sampling). Use LLM-as-judge with a rubric.

Self-reporting patterns

Daily digest

End of each day, the agent summarizes what it did, flagged issues, and notable outputs. Sent to you via email or Slack.

Anomaly flagging

If something is far from baseline (3× longer run time, 5× more tool calls, output size way off), the agent stops and notifies you.

Confidence scores

Agent rates its own confidence per task. Low-confidence outputs get flagged for human review instead of shipping.

When to auto-restart

Escalation ladder

When the agent can't proceed, what happens?

  1. Retry, same task, slight variation
  2. Fallback, simpler version of the task
  3. Pause, stop this run, wait for next cycle
  4. Notify, send a message to human; include context
  5. Disable, turn off future runs until human resets

Useful signals to track per-run

Stored as JSON lines in a log file, you can grep for patterns after the fact.

The agent is a system. Treat its observability like you would a production service. Latency. Error rate. Cost. Quality. You need all four.