Without budgets, a single stuck agent can make 200 tool calls in three minutes, exhaust your API quota, run up $50 in LLM spend, and produce nothing useful. Budgets are the circuit breakers that stop this. They're not optional for production: they're the line between "tried an agent once" and "can actually run agents at scale."
Each layer catches a different failure mode. Per-session catches a stuck agent. Per-tool catches overuse of an expensive operation. Per-user catches one runaway user. Global is the last line of defense when multiple of the above go wrong at once.
The minimum agent should have all four of these caps from day one:
Whichever one trips first wins. The session ends gracefully (see below) instead of continuing into expensive territory.
Some tools are disproportionately expensive. A call to a heavy ML model might cost $0.20 each. A web search is a cent. You don't want an agent that happily calls the expensive one 20 times in a session.
tool_limits = {
"expensive_ml_model": 3, # max 3 per session
"search_web": 10,
"read_file": 50,
}
When the limit is hit, the tool returns an error the model can reason about: "You've called expensive_ml_model 3 times this session, the limit. Use a different approach."
The most common runaway pattern: model calls the same tool with the same arguments repeatedly, hoping for a different result. Detect it:
(tool_name, args) pair.This catches most runaways before they rack up meaningful cost.
When a budget limit is hit, don't just kill the session. Give the model one final turn with the message: "Your budget is reached. Summarize what you've found and return the best answer you can with the current information." This produces a useful "here's what I learned" response instead of a raw timeout. User-facing agents should always do this.
Surface the remaining budget in the agent's context every turn:
[Budget: 6 tool calls remaining, $0.18 left, 45 seconds to go]
A well-prompted model will self-ration: pick cheaper tools, parallelize, wrap up early. Hidden budgets produce surprise cutoffs; surfaced budgets produce smart agents.
web_search ≤ 10, fetch_page ≤ 8, llm_summarize ≤ 5With these caps, a single research session costs at most $0.30 and takes at most 15 turns. If the task needs more, the agent returns what it has and the user can ask for more. No runaway, no surprise bill, and 99% of real tasks complete well inside these caps.
Log every session's final budget consumption. Metrics to track: