Observability + tracing

Agents fail in ways that traditional apps don't: non-deterministic reasoning, cascading tool errors, quality drift. Without comprehensive tracing, these failures are invisible until a customer complains. Observability isn't optional.

What to log per session

Traces as debugging artifacts

A trace is the replayable record of an agent session. Given a trace you should be able to:

Tools

Sampling

Full tracing at high QPS gets expensive. Sample:

Alerts

Alert on:

The feedback loop

Observability feeds eval:

  1. Production trace reveals a failure
  2. Turn the failing case into an eval case
  3. Fix the issue
  4. Verify fix in eval; add to regression suite

Without this loop, the same bugs recur.