Tool descriptions matter

Your tool description is read by the LLM at every step of the loop. It's the most important prompt in your agent that most developers treat as an afterthought. A weak description produces inconsistent tool use. A precise description produces reliable behavior.

What a good description includes

  1. What the tool does, one sentence, active voice
  2. When to use it, the scenarios this is the right choice
  3. When NOT to use it, common misapplications
  4. What it returns, shape + meaning of the result
  5. Constraints or limits, rate limits, size limits, latency

Example: weak description

"Search the web."

Example: strong description

"Search the public web for recent news and information. Use this
when the user asks about current events, recent developments, or
facts that may have changed since your training cutoff. Returns
top 5 results with title, URL, and snippet. Do NOT use for
searching internal documents (use search_internal_docs instead)
or for math (use calculator). Rate limited to 30 calls per minute."

Parameter descriptions matter too

Each parameter gets its own description. Tell the LLM what format, what valid values, any constraints.

{
  "query": {
    "type": "string",
    "description": "Natural language search query. 3-10 words works best. Include specific entities and timeframes."
  },
  "recency": {
    "type": "string",
    "enum": ["day", "week", "month", "year"],
    "description": "How recent results should be. Default is 'month'."
  }
}

The "when not to use" is critical

When you have multiple similar tools, each description must explicitly route the LLM to the right one. Without negative examples, the LLM picks more or less randomly when tools overlap.

Test your descriptions

Run your agent with intentionally ambiguous tasks and watch which tools it calls. If it's calling the wrong tool consistently, the descriptions need work, not the model.