Tool budgets + limits

Without budgets, a single stuck agent can make 200 tool calls in three minutes, exhaust your API quota, run up $50 in LLM spend, and produce nothing useful. Budgets are the circuit breakers that stop this. They're not optional for production: they're the line between "tried an agent once" and "can actually run agents at scale."

The four layers of budget

Each layer catches a different failure mode. Per-session catches a stuck agent. Per-tool catches overuse of an expensive operation. Per-user catches one runaway user. Global is the last line of defense when multiple of the above go wrong at once.

Per-session, the first thing you ship

The minimum agent should have all four of these caps from day one:

Whichever one trips first wins. The session ends gracefully (see below) instead of continuing into expensive territory.

Per-tool limits

Some tools are disproportionately expensive. A call to a heavy ML model might cost $0.20 each. A web search is a cent. You don't want an agent that happily calls the expensive one 20 times in a session.

tool_limits = {
    "expensive_ml_model": 3,   # max 3 per session
    "search_web": 10,
    "read_file": 50,
}

When the limit is hit, the tool returns an error the model can reason about: "You've called expensive_ml_model 3 times this session, the limit. Use a different approach."

Loop detection

The most common runaway pattern: model calls the same tool with the same arguments repeatedly, hoping for a different result. Detect it:

This catches most runaways before they rack up meaningful cost.

Graceful termination

When a budget limit is hit, don't just kill the session. Give the model one final turn with the message: "Your budget is reached. Summarize what you've found and return the best answer you can with the current information." This produces a useful "here's what I learned" response instead of a raw timeout. User-facing agents should always do this.

Telling the model its budget

Surface the remaining budget in the agent's context every turn:

[Budget: 6 tool calls remaining, $0.18 left, 45 seconds to go]

A well-prompted model will self-ration: pick cheaper tools, parallelize, wrap up early. Hidden budgets produce surprise cutoffs; surfaced budgets produce smart agents.

A worked example: a research agent with budgets

With these caps, a single research session costs at most $0.30 and takes at most 15 turns. If the task needs more, the agent returns what it has and the user can ask for more. No runaway, no surprise bill, and 99% of real tasks complete well inside these caps.

Observability

Log every session's final budget consumption. Metrics to track:

Pitfalls

What to do with this