Tool design

Tool design is where most agents win or lose. You can use the best model and the smartest prompt, and if your tools are badly shaped the agent will still stumble. Good tools are named like what they do, do one thing each, have few parameters with strong types, and return results the model can reason about. This page is the field guide.

The mental model

Think of your tools as the UI the model sees. Just like a human UI, clarity matters more than cleverness. The model has to pick the right tool and pass the right arguments based on a short description. If the UI is messy, the model makes bad choices. If the UI is clear, the model rarely picks wrong.

The six principles

One tool, one job

A tool with 15 optional parameters is really 15 tools pretending to be one. The model has to figure out which combination of parameters corresponds to the job at hand. That's harder than picking from 4 specific tools. Split them.

Name tools like a beginner would

A new hire reading your tool list should understand what each one does from the name alone. search_internal_docs passes. query_kb_v2 fails. The model reads the name first and pattern-matches; help it.

Keep parameter count low

2-4 parameters is the sweet spot. At 6+ parameters, the model starts missing required ones or hallucinating values. If you truly need more inputs, the tool should probably be split or should take a structured config object with defaults.

Strong types beat free-form strings

A recency parameter typed as enum: ["day", "week", "month"] can't be wrong. Typed as string, the model might pass "recent" or "1d" or "last week" and surprise you. Constrain the shape.

Sensible defaults for optional params

Every optional parameter should default to what a reasonable first call would use. The model gets the tool right on turn 1 instead of turn 3.

Bounded output

A tool that can return 50KB of JSON is a tool that blows up the context window. Truncate, paginate, or return a summary. If the model needs more, give it a follow-up tool.

Granularity, the Goldilocks problem

The right granularity is intent-level: each tool maps to one thing a user would actually want to do. Not "manipulate the database" and not the individual SQL primitives. Intents.

A worked example: redesigning a bad tool

Before:

{
  "name": "crm",
  "description": "Do stuff with the CRM.",
  "input_schema": {
    "type": "object",
    "properties": {
      "action": {"type": "string"},
      "entity": {"type": "string"},
      "filter": {"type": "string"},
      "data": {"type": "object"},
      "limit": {"type": "integer"},
      "include_archived": {"type": "boolean"}
    }
  }
}

The model has to invent the right combination of action + entity + filter. It guesses. It gets it wrong. Fix: split into intent-level tools.

After:

find_contact(name)            → list of contacts
get_contact(id)               → full contact
create_contact(name, email)   → new contact id
update_contact(id, fields)    → ok
find_deals_for(contact_id)    → list of deals

Each tool has one intent. The model doesn't have to invent a schema; it picks the tool that matches what it wants to do. Accuracy jumps, bugs drop.

Response shape: what to return

Your tool return value is what the model reads on the next turn. Shape it for reasoning:

Common mistakes

What to do with this