The agent architecture map
📖 5 min readUpdated 2026-04-19
In a diagram an agent is three boxes: LLM, tools, loop. In production it's a dozen components you have to build, operate, and debug. This page is the map. Once you can see the whole stack, it becomes obvious why your first agent broke and what you need to add to make the next one survive.
The full stack
Every layer is a separate design decision. Skip one and you get a specific failure mode: skip memory and you lose context between turns, skip guardrails and you ship an injection hole, skip orchestration limits and you get runaway bills.
The flow from click to answer
- User input arrives. API call, chat message, cron fire.
- Auth + session resolved. Who is this, what can they see, is there a prior conversation?
- Prompt is assembled. System prompt + user input + pulled memory + tool definitions. This is the most underrated step.
- First LLM call. Model sees everything, returns either an answer or a tool call.
- If tool call: validate arguments, execute, capture result, append to context.
- Loop: back to the model with updated context. Repeat until done or cap hit.
- Final answer. Format, stream, return to user.
- Persist. Save session state, update long-term memory if needed.
- Log the trace. Every LLM call, every tool call, every input and output, for debugging and eval.
Where production agents actually break
Minimum viable production agent
The short list of things that have to exist before you ship. Missing any of these and the agent will fail silently in ways that are expensive to fix after the fact:
- A system prompt that states the goal, constraints, tone, and when to stop.
- Three to eight tools, each with a precise description and a typed schema.
- A loop with hard caps on steps (~10-20), time (~60 seconds), and cost (~$0.50).
- Structured output so the caller doesn't have to parse free text.
- Tracing that captures every LLM request, every tool call, every latency, every token count.
- Tool-level error handling so one flaky API doesn't kill the whole run.
- An eval set of at least 30 real cases you can re-run whenever you change anything.
- A kill switch (a flag or env var) that lets you disable the agent without redeploying.
A worked example: customer support agent
Say you're building an agent to handle Tier 1 support tickets. Walk the stack:
- UI: webhook from your ticketing system. Trigger on new ticket.
- Auth: verify the webhook signature. Load the customer's account.
- Prompt: system prompt ("you are a support agent, be polite, only use these tools") + the ticket + last 3 messages from this customer + tool specs.
- Tools:
get_account(), get_recent_orders(), refund(), escalate_to_human(), send_reply().
- Memory: short-term = this ticket's context. Long-term = nothing yet; one ticket is one session.
- Loop: max 8 steps, $0.20 cap.
- Safety:
refund() over $100 requires human approval. No PII in logs.
- Observability: every trace saved to your eval store tagged by outcome.
That's a production agent. Not complicated, but every piece is deliberate.
The layers you'll underbuild the first time
- Tool descriptions. First attempt is always too terse. The model treats the description as the source of truth about when to call a tool; skimp here and it calls the wrong one.
- Stop conditions. "Max 10 steps" feels fine until you hit a bug that makes the agent call the same tool 10 times. Add per-tool caps too.
- Tracing. "I'll add logging later" is how you end up with a production agent that's failing 12% of the time and you have no way to see what's happening.
- Eval. Without a repeatable test set, every change is a coin flip.
What to do with this
- Draw your own agent's stack layer by layer. Circle the layers that don't exist yet. Build those before you ship.
- Read ReAct for the orchestration layer in detail.
- Read observability + tracing before you put traffic on it.
Further reading
Watch
Andrej Karpathy - Intro to Large Language Models (1 hour)