Tool budgets + limits

A runaway agent can make hundreds of tool calls in minutes. Without budgets, this means exhausted API quotas, massive bills, and sometimes infinite loops. Budget enforcement is a non-negotiable production practice.

The layers of budget

Per-session

Max tool calls, max LLM tokens, max total time, max dollar cost per agent session. If any limit hits, the agent is stopped.

Per-tool

Rate limits per tool. "This tool can be called 10 times per session max." Prevents overuse of specific expensive operations.

Per-user

Cost limit per user per day. Prevents one user's runaway session from consuming the whole budget.

Global

Hard cap on total concurrent agent sessions or total spend per hour. Last line of defense.

Hard stops

Loop detection

Track a hash of (tool_name, args) per session. If the same hash appears 3 times in a row, the agent is probably stuck. Break and return what you have.

Graceful termination

When budget is reached, don't just stop. Give the LLM one final turn: "Your budget is reached. Summarize what you've found and give the best answer you can with current info."

Surface budget status to the LLM

Include remaining budget in the agent's context. "You have 5 tool calls and $0.20 remaining." The agent can self-ration.

Per-tool rate limiting

Some tools are expensive (GPU inference, third-party API with per-call fees). Rate-limit them specifically:

tool_limits = {
    "expensive_ml_model": 3,  # max 3 calls per session
    "search_web": 10,
    "read_file": 50,
}

Observability of budget usage

Log every session's final budget consumption. Alert on sessions that hit budget ceilings frequently, signal of bad prompts, buggy tools, or adversarial users.