The agent architecture map

In a diagram an agent is three boxes: LLM, tools, loop. In production it's a dozen components you have to build, operate, and debug. This page is the map. Once you can see the whole stack, it becomes obvious why your first agent broke and what you need to add to make the next one survive.

The full stack

Every layer is a separate design decision. Skip one and you get a specific failure mode: skip memory and you lose context between turns, skip guardrails and you ship an injection hole, skip orchestration limits and you get runaway bills.

The flow from click to answer

  1. User input arrives. API call, chat message, cron fire.
  2. Auth + session resolved. Who is this, what can they see, is there a prior conversation?
  3. Prompt is assembled. System prompt + user input + pulled memory + tool definitions. This is the most underrated step.
  4. First LLM call. Model sees everything, returns either an answer or a tool call.
  5. If tool call: validate arguments, execute, capture result, append to context.
  6. Loop: back to the model with updated context. Repeat until done or cap hit.
  7. Final answer. Format, stream, return to user.
  8. Persist. Save session state, update long-term memory if needed.
  9. Log the trace. Every LLM call, every tool call, every input and output, for debugging and eval.

Where production agents actually break

Minimum viable production agent

The short list of things that have to exist before you ship. Missing any of these and the agent will fail silently in ways that are expensive to fix after the fact:

  1. A system prompt that states the goal, constraints, tone, and when to stop.
  2. Three to eight tools, each with a precise description and a typed schema.
  3. A loop with hard caps on steps (~10-20), time (~60 seconds), and cost (~$0.50).
  4. Structured output so the caller doesn't have to parse free text.
  5. Tracing that captures every LLM request, every tool call, every latency, every token count.
  6. Tool-level error handling so one flaky API doesn't kill the whole run.
  7. An eval set of at least 30 real cases you can re-run whenever you change anything.
  8. A kill switch (a flag or env var) that lets you disable the agent without redeploying.

A worked example: customer support agent

Say you're building an agent to handle Tier 1 support tickets. Walk the stack:

That's a production agent. Not complicated, but every piece is deliberate.

The layers you'll underbuild the first time

What to do with this

Further reading

Watch

Andrej Karpathy - Intro to Large Language Models (1 hour)