Cost control

Agents can rack up bills fast. A single runaway session can cost $10. A thousand users hitting the same problem can cost $10,000 in a day. Cost control is a first-class production concern, not an afterthought.

Where the cost goes

Cost reduction strategies

Model routing

Cheap model for simple tasks, expensive model for complex ones. Classify upfront or escalate mid-session.

Prompt caching

Claude and Gemini support caching the system prompt so it's not reprocessed each turn. Massive savings in multi-turn sessions.

Context trimming

Don't send 50K of history when the model needs 5K. Summarize aggressively.

Tool output limits

Cap each tool's response size. Huge tool outputs balloon the next LLM call.

Budget per session

Hard cap: "This session can't exceed $0.50." When reached, gracefully terminate.

Caching expensive tool results

Same search query = cached result. Same model inference on same prompt = cached.

The budget stack

Layered, so no single failure breaks the bank.

Measuring cost per outcome

Cost per resolved task, not cost per request. A task that takes 3 sessions to complete costs more than it looks.

Cost alerts

Alert when cost per task rises 20%+. Usually signals a regression (bad prompt, model downgrade, agent stuck in loops).