Agents can rack up bills fast. A single runaway session can cost $10. A thousand users hitting the same problem can cost $10,000 in a day. Cost control is a first-class production concern, not an afterthought.
Cheap model for simple tasks, expensive model for complex ones. Classify upfront or escalate mid-session.
Claude and Gemini support caching the system prompt so it's not reprocessed each turn. Massive savings in multi-turn sessions.
Don't send 50K of history when the model needs 5K. Summarize aggressively.
Cap each tool's response size. Huge tool outputs balloon the next LLM call.
Hard cap: "This session can't exceed $0.50." When reached, gracefully terminate.
Same search query = cached result. Same model inference on same prompt = cached.
Layered, so no single failure breaks the bank.
Cost per resolved task, not cost per request. A task that takes 3 sessions to complete costs more than it looks.
Alert when cost per task rises 20%+. Usually signals a regression (bad prompt, model downgrade, agent stuck in loops).