Home›Expertise›AI Agents›Tool budgets + limits

Tool budgets + limits

📖 3 min readUpdated 2026-04-19

Without budgets, a single stuck agent can make 200 tool calls in three minutes, exhaust your API quota, run up $50 in LLM spend, and produce nothing useful. Budgets are the circuit breakers that stop this. They're not optional for production: they're the line between "tried an agent once" and "can actually run agents at scale."

The four layers of budget

Each layer catches a different failure mode. Per-session catches a stuck agent. Per-tool catches overuse of an expensive operation. Per-user catches one runaway user. Global is the last line of defense when multiple of the above go wrong at once.

Per-session, the first thing you ship

The minimum agent should have all four of these caps from day one:

Max steps: 10-30. Stops runaway loops.
Max tokens: a ceiling on total LLM tokens across all turns in a session.
Max time: 60 seconds for user-facing, minutes for background work.
Max cost: $0.50 per session for most agents; adjust by value.

Whichever one trips first wins. The session ends gracefully (see below) instead of continuing into expensive territory.

Per-tool limits

Some tools are disproportionately expensive. A call to a heavy ML model might cost $0.20 each. A web search is a cent. You don't want an agent that happily calls the expensive one 20 times in a session.

tool_limits = {
    "expensive_ml_model": 3,   # max 3 per session
    "search_web": 10,
    "read_file": 50,
}

When the limit is hit, the tool returns an error the model can reason about: "You've called expensive_ml_model 3 times this session, the limit. Use a different approach."

Loop detection

The most common runaway pattern: model calls the same tool with the same arguments repeatedly, hoping for a different result. Detect it:

Hash every (tool_name, args) pair.
If the same hash appears 3 times consecutively, halt the session.
Return whatever partial results you have and log the trace for review.

This catches most runaways before they rack up meaningful cost.

Graceful termination

When a budget limit is hit, don't just kill the session. Give the model one final turn with the message: "Your budget is reached. Summarize what you've found and return the best answer you can with the current information." This produces a useful "here's what I learned" response instead of a raw timeout. User-facing agents should always do this.

Telling the model its budget

Surface the remaining budget in the agent's context every turn:

[Budget: 6 tool calls remaining, $0.18 left, 45 seconds to go]

A well-prompted model will self-ration: pick cheaper tools, parallelize, wrap up early. Hidden budgets produce surprise cutoffs; surfaced budgets produce smart agents.

A worked example: a research agent with budgets

Max steps: 15
Max cost: $0.30
Per-tool: web_search ≤ 10, fetch_page ≤ 8, llm_summarize ≤ 5
Loop detection: 3 duplicates halts
Graceful termination: "You have 1 step left, summarize now"

With these caps, a single research session costs at most $0.30 and takes at most 15 turns. If the task needs more, the agent returns what it has and the user can ask for more. No runaway, no surprise bill, and 99% of real tasks complete well inside these caps.

Observability

Log every session's final budget consumption. Metrics to track:

Distribution of sessions by cost, latency, and step count.
Percentage of sessions that hit a cap (should be small; if ~20% are hitting max steps you need bigger caps or better tools).
Which tools are most frequently hitting per-tool limits.
Per-user cost over time (catch abusers or misuse).

Pitfalls

No budget at all. One bug costs you a four-figure bill.
Too-loose budgets. "Max 100 steps" is basically unlimited. Start tight, widen if real tasks hit the cap.
Too-tight budgets. Max 5 steps makes every real task fail. Tune with production data.
Silent budget hits. The user sees "session ended" with no summary. Add graceful termination.
No global cap. Per-session is fine until something triggers 10,000 sessions in a minute. Global caps catch this.

What to do with this

Add all four budget layers to your agent today if they don't exist. Code is short; savings are enormous.
Read cost control for the bigger picture on agent spend.
Read observability + tracing so you can actually see when budgets are firing.