Human-in-the-loop

Fully autonomous agents are rare in real production. Most production agent systems have humans in the loop at specific decision points, not because the AI isn't smart enough, but because some decisions shouldn't be fully automated even if they could be. Human-in-the-loop isn't a limitation. It's a design choice. Knowing where to place the human is how you ship reliably without becoming a headline.

When the human should pause things

Three interaction models

Pre-action approval: the default for high-stakes

The strongest pattern: agent always proposes, human clicks execute. The agent wrote the email draft; the human presses send. The agent picked the refund; the human approves. This keeps the human as the actor of record, which matters for liability and for user trust. Most users are more comfortable with "AI helped me do this" than "AI did this to me."

Post-action review: for scale

When volume is too high for per-action approval, let the agent act and review a sample. The agent auto-categorized 1000 tickets today; a human reviews 5% randomly sampled, weighted toward low-confidence cases. Drift in the auto-categorizer shows up in the review rate quickly.

Async handoff: for escalation

Agent is mid-task, hits something it can't handle. Don't fail the session; hand off. Tier-1 → tier-2. Support agent → human rep. Billing agent → finance. The task pauses, joins a human queue, picks back up after resolution.

The confidence threshold

An agent can (and should) estimate its own confidence on any action: "I'm 95% sure this is the right refund" vs "I'm only 40% sure this is the right category." Route by threshold:

Track the calibration: when the agent says 90%, how often is it actually right? Adjust thresholds with real production data.

A worked example: customer refunds

Agent handles 95% of volume with a human touching 30% of dollars. One human reviewer can keep up with thousands of tickets a day because most cases are pre-approved low-risk.

Queue management

If agent output exceeds human review capacity, queues build up and the UX degrades. When you see this happening:

Pitfalls

What to do with this