Tool design is where most agents win or lose. You can use the best model and the smartest prompt, and if your tools are badly shaped the agent will still stumble. Good tools are named like what they do, do one thing each, have few parameters with strong types, and return results the model can reason about. This page is the field guide.
Think of your tools as the UI the model sees. Just like a human UI, clarity matters more than cleverness. The model has to pick the right tool and pass the right arguments based on a short description. If the UI is messy, the model makes bad choices. If the UI is clear, the model rarely picks wrong.
A tool with 15 optional parameters is really 15 tools pretending to be one. The model has to figure out which combination of parameters corresponds to the job at hand. That's harder than picking from 4 specific tools. Split them.
A new hire reading your tool list should understand what each one does from the name alone. search_internal_docs passes. query_kb_v2 fails. The model reads the name first and pattern-matches; help it.
2-4 parameters is the sweet spot. At 6+ parameters, the model starts missing required ones or hallucinating values. If you truly need more inputs, the tool should probably be split or should take a structured config object with defaults.
A recency parameter typed as enum: ["day", "week", "month"] can't be wrong. Typed as string, the model might pass "recent" or "1d" or "last week" and surprise you. Constrain the shape.
Every optional parameter should default to what a reasonable first call would use. The model gets the tool right on turn 1 instead of turn 3.
A tool that can return 50KB of JSON is a tool that blows up the context window. Truncate, paginate, or return a summary. If the model needs more, give it a follow-up tool.
The right granularity is intent-level: each tool maps to one thing a user would actually want to do. Not "manipulate the database" and not the individual SQL primitives. Intents.
Before:
{
"name": "crm",
"description": "Do stuff with the CRM.",
"input_schema": {
"type": "object",
"properties": {
"action": {"type": "string"},
"entity": {"type": "string"},
"filter": {"type": "string"},
"data": {"type": "object"},
"limit": {"type": "integer"},
"include_archived": {"type": "boolean"}
}
}
}
The model has to invent the right combination of action + entity + filter. It guesses. It gets it wrong. Fix: split into intent-level tools.
After:
find_contact(name) → list of contacts get_contact(id) → full contact create_contact(name, email) → new contact id update_contact(id, fields) → ok find_deals_for(contact_id) → list of deals
Each tool has one intent. The model doesn't have to invent a schema; it picks the tool that matches what it wants to do. Accuracy jumps, bugs drop.
Your tool return value is what the model reads on the next turn. Shape it for reasoning:
[{"id":1},{"id":2},{"id":3}].{"error": "not_found", "message": "...", "suggestion": "..."} not a thrown exception.run_function, process_data, do_it. The model has no way to pick.[] when the API was actually down. Model thinks "no results" and answers the user with wrong info.search_docs and find_docs. The model flips between them randomly.