The autonomy spectrum

Not every AI system is an agent. Not every agent is autonomous. Autonomy is a spectrum from passive chatbot to fully self-running system. Knowing where you are on the spectrum tells you what you can and can't do.

The five levels

Level 1. Chatbot

Generates text in response to a user's message. No memory, no tools, no action. A very fast autocomplete.

Example: early ChatGPT, before tool use.

Level 2. Assistant

Chatbot plus memory and maybe limited tools (search, file access). Still initiated by the human; still waits for prompts.

Example: modern Claude/ChatGPT with the "search the web" tool enabled.

Level 3. Agent with human-in-the-loop

Can chain tool calls, take multi-step actions, use external systems. But every risky action requires human approval. Good for code-writing, research, customer support.

Example: Claude Code running in default mode, it asks before running a shell command.

Level 4. Agent with auto mode

Pre-authorized to do most things within a defined scope. Stops and asks only for actions outside that scope. This is where most real productivity lives today.

Example: Claude Code with auto mode enabled and a deny list for destructive commands.

Level 5. Fully autonomous

Runs headless. Scheduled or event-triggered. Self-monitors. Can escalate when it gets stuck, but doesn't require a human to run. Has durable state.

Example: a scheduled agent that scrapes a data source, writes a daily report, emails stakeholders, and flags anomalies, all without a person starting it.

How to know where you are

Ask these questions:

  1. Who starts each run? Human → Level 1–3. Schedule or event → Level 4–5.
  2. How many tool calls between human approvals? 0 → Level 1–2. 1–5 → Level 3. Dozens → Level 4. Unlimited within scope → Level 5.
  3. Can it recover from errors on its own? No → Level 1–3. Yes, for some → Level 4. Yes, for most → Level 5.
  4. Does it know when to stop? Only if a human tells it → Level 1–4. Via built-in self-checks → Level 5.

Picking the right level for your system

Insight: Most people aim too high too soon. Start at Level 3. Prove it works. Then reduce human approvals one at a time as trust builds.

Level 5 is glamorous but brittle. The failure modes are unforgiving, a fully autonomous agent that silently goes off the rails can cause real damage. Build up in stages.

Trust curves, not trust switches

Autonomy should follow observed reliability. If your agent has handled 1,000 tasks at Level 3 without a major error, promote it to Level 4 for that class of task. Not all classes at once, specific ones. The mistake is flipping a system from Level 3 to Level 5 because "it's working." Working on 1,000 tasks ≠ working on 10,000.