Not every AI system is an agent. Not every agent is autonomous. Autonomy is a spectrum from passive chatbot to fully self-running system. Knowing where you are on the spectrum tells you what you can and can't do.
Generates text in response to a user's message. No memory, no tools, no action. A very fast autocomplete.
Example: early ChatGPT, before tool use.
Chatbot plus memory and maybe limited tools (search, file access). Still initiated by the human; still waits for prompts.
Example: modern Claude/ChatGPT with the "search the web" tool enabled.
Can chain tool calls, take multi-step actions, use external systems. But every risky action requires human approval. Good for code-writing, research, customer support.
Example: Claude Code running in default mode, it asks before running a shell command.
Pre-authorized to do most things within a defined scope. Stops and asks only for actions outside that scope. This is where most real productivity lives today.
Example: Claude Code with auto mode enabled and a deny list for destructive commands.
Runs headless. Scheduled or event-triggered. Self-monitors. Can escalate when it gets stuck, but doesn't require a human to run. Has durable state.
Example: a scheduled agent that scrapes a data source, writes a daily report, emails stakeholders, and flags anomalies, all without a person starting it.
Ask these questions:
Level 5 is glamorous but brittle. The failure modes are unforgiving, a fully autonomous agent that silently goes off the rails can cause real damage. Build up in stages.
Autonomy should follow observed reliability. If your agent has handled 1,000 tasks at Level 3 without a major error, promote it to Level 4 for that class of task. Not all classes at once, specific ones. The mistake is flipping a system from Level 3 to Level 5 because "it's working." Working on 1,000 tasks ≠ working on 10,000.