"Going autonomous" means the agent doesn't need you in the room anymore. It starts itself on a schedule or an event. It does its work. It recovers from the ordinary errors on its own. It only bothers you when something genuinely requires human judgment. This section is about the ladder from "works when I run it" to "runs while I'm asleep", and the specific things you need in place at each rung.
You should already have a working Level 3 agent (see the autonomy spectrum). Meaning: an agent that does real work when a human runs it, with human-in-the-loop approvals for risky steps. If you haven't got that yet, the rest of this section is premature. Get Level 3 working on a task you care about, then come back.
Going autonomous isn't one change. It's a stack of six layers working together. Missing any one of them and the whole thing falls over at 3am.
Each layer assumes the one below it works. If the model is unreliable, nothing above matters. If the harness is flaky, scheduling it won't help. Build bottom-up; skip nothing.
These four pages each cover one layer (or one cluster). Read in order, they build on each other.
Most tasks shouldn't be autonomous. Autonomy has real costs, infrastructure, monitoring, risk, and for many workflows a Level 3 agent with a human in the loop is a better solution. Here's the test.
Zoomed out, the journey from "first session" to "fully autonomous" looks like this:
Skip any step and you end up with the classic autonomy failure: the agent runs confidently for a day, then does something unexpected, and nobody was watching. The steps aren't optional. They're how you earn trust with an agent, the same way you earn trust with a new employee.
Andrej Karpathy - Let's reproduce GPT-2 (124M)