Tool design

📖 4 min readUpdated 2026-04-19

Tool design is where most agents win or lose. You can use the best model and the smartest prompt, and if your tools are badly shaped the agent will still stumble. Good tools are named like what they do, do one thing each, have few parameters with strong types, and return results the model can reason about. This page is the field guide.

The mental model

Think of your tools as the UI the model sees. Just like a human UI, clarity matters more than cleverness. The model has to pick the right tool and pass the right arguments based on a short description. If the UI is messy, the model makes bad choices. If the UI is clear, the model rarely picks wrong.

The six principles

One tool, one job

A tool with 15 optional parameters is really 15 tools pretending to be one. The model has to figure out which combination of parameters corresponds to the job at hand. That's harder than picking from 4 specific tools. Split them.

Name tools like a beginner would

A new hire reading your tool list should understand what each one does from the name alone. search_internal_docs passes. query_kb_v2 fails. The model reads the name first and pattern-matches; help it.

Keep parameter count low

2-4 parameters is the sweet spot. At 6+ parameters, the model starts missing required ones or hallucinating values. If you truly need more inputs, the tool should probably be split or should take a structured config object with defaults.

Strong types beat free-form strings

A recency parameter typed as enum: ["day", "week", "month"] can't be wrong. Typed as string, the model might pass "recent" or "1d" or "last week" and surprise you. Constrain the shape.

Sensible defaults for optional params

Every optional parameter should default to what a reasonable first call would use. The model gets the tool right on turn 1 instead of turn 3.

Bounded output

A tool that can return 50KB of JSON is a tool that blows up the context window. Truncate, paginate, or return a summary. If the model needs more, give it a follow-up tool.

Granularity, the Goldilocks problem

The right granularity is intent-level: each tool maps to one thing a user would actually want to do. Not "manipulate the database" and not the individual SQL primitives. Intents.

A worked example: redesigning a bad tool

Before:

{
  "name": "crm",
  "description": "Do stuff with the CRM.",
  "input_schema": {
    "type": "object",
    "properties": {
      "action": {"type": "string"},
      "entity": {"type": "string"},
      "filter": {"type": "string"},
      "data": {"type": "object"},
      "limit": {"type": "integer"},
      "include_archived": {"type": "boolean"}
    }
  }
}

The model has to invent the right combination of action + entity + filter. It guesses. It gets it wrong. Fix: split into intent-level tools.

After:

find_contact(name)            → list of contacts
get_contact(id)               → full contact
create_contact(name, email)   → new contact id
update_contact(id, fields)    → ok
find_deals_for(contact_id)    → list of deals

Each tool has one intent. The model doesn't have to invent a schema; it picks the tool that matches what it wants to do. Accuracy jumps, bugs drop.

Response shape: what to return

Your tool return value is what the model reads on the next turn. Shape it for reasoning:

Structured JSON, not free-form text. The model parses it more reliably.
Include context, not just raw data. "Found 3 customers matching 'Smith'" beats [{"id":1},{"id":2},{"id":3}].
Errors as data. {"error": "not_found", "message": "...", "suggestion": "..."} not a thrown exception.
Bounded. If the raw result is 500 rows, return the first 20 + a "showing 20 of 500" marker.

Common mistakes

Generic names. run_function, process_data, do_it. The model has no way to pick.
Silent success on failure. Returns [] when the API was actually down. Model thinks "no results" and answers the user with wrong info.
Optional params without defaults. Every call becomes a guessing game about what the default should be.
Two tools that overlap. search_docs and find_docs. The model flips between them randomly.
Raw SDKs as tools. Exposing your HTTP client directly instead of shaped-for-the-agent tools. Too much surface, too little clarity.
Huge untruncated results. Tool returns a 200KB doc, context fills, agent fails.

What to do with this

Audit your current agent's tools against the six principles. Usually at least 2 of them are violated.
Read tool descriptions matter for the prose half of tool design.
Read tool error handling for the error-as-data pattern.