A runaway agent can make hundreds of tool calls in minutes. Without budgets, this means exhausted API quotas, massive bills, and sometimes infinite loops. Budget enforcement is a non-negotiable production practice.
Max tool calls, max LLM tokens, max total time, max dollar cost per agent session. If any limit hits, the agent is stopped.
Rate limits per tool. "This tool can be called 10 times per session max." Prevents overuse of specific expensive operations.
Cost limit per user per day. Prevents one user's runaway session from consuming the whole budget.
Hard cap on total concurrent agent sessions or total spend per hour. Last line of defense.
Track a hash of (tool_name, args) per session. If the same hash appears 3 times in a row, the agent is probably stuck. Break and return what you have.
When budget is reached, don't just stop. Give the LLM one final turn: "Your budget is reached. Summarize what you've found and give the best answer you can with current info."
Include remaining budget in the agent's context. "You have 5 tool calls and $0.20 remaining." The agent can self-ration.
Some tools are expensive (GPU inference, third-party API with per-call fees). Rate-limit them specifically:
tool_limits = {
"expensive_ml_model": 3, # max 3 calls per session
"search_web": 10,
"read_file": 50,
}
Log every session's final budget consumption. Alert on sessions that hit budget ceilings frequently, signal of bad prompts, buggy tools, or adversarial users.