Agent costs can sneak up on you in a way single-call costs don't. A stuck session can cost $10 instead of $0.10. One bug multiplied across a thousand users can burn $10K in an afternoon. Cost control isn't a nice-to-have; it's what separates agents-in-production from agents-that-got-their-team-yelled-at. This page is the playbook.
LLM tokens dominate almost every agent's bill. Input tokens grow with context length; output tokens grow with how much reasoning + response the model produces. Optimize those first.
Anthropic's Claude, Google's Gemini, and OpenAI all support caching stable parts of the prompt. Your system prompt is the same every turn; cache it. On Anthropic, a cached token is 10% of the normal input cost, and cache hits are typical on multi-turn sessions. For agents with a 2K-token system prompt and 15-turn sessions, this alone can cut your bill in half.
Not every task needs the flagship model. A lightweight classifier (or even a rule) decides: "This looks simple → cheap model. This looks complex → premium." Can cut costs 70-80% on query-heavy agents where most requests are easy. Build escalation too: cheap model tries first, escalates if stuck.
A 20-turn agent session that sends 80K tokens every turn is mostly paying to resend old context. Summarize old turns into a few hundred tokens when context passes a threshold. Can drop per-turn cost 40-70% on long sessions.
A tool that returns 50K of text blows up the next LLM call. Cap every tool's output at a reasonable size (e.g., 4K tokens). If the agent needs more, it can call a follow-up tool with narrower scope.
A hard cap at $0.50 or whatever fits your product. When hit, gracefully terminate. Doesn't reduce average cost but prevents the $10 runaway from happening at all. This is the thing you absolutely cannot ship without.
Same search query within the session → cached result. Same LLM prompt with deterministic settings → cached. Especially valuable when your agent naturally repeats queries (research, debugging, customer lookup).
Layered so a single failure at one layer doesn't cascade.
A task that takes 3 sessions to complete costs 3× what a single session shows. Track cost-per-resolved-task (or cost-per-successful-outcome). That's the number that matters for unit economics, not cost-per-request.
Team's support agent costs average $0.28 per ticket. Monthly spend: $14K at 50K tickets. Actions:
$10K/month back with a week of engineering. No quality drop (verified against eval set).
Set alerts on: