Your tool description is read by the LLM at every step of the loop. It's the most important prompt in your agent that most developers treat as an afterthought. A weak description produces inconsistent tool use. A precise description produces reliable behavior.
"Search the web."
"Search the public web for recent news and information. Use this when the user asks about current events, recent developments, or facts that may have changed since your training cutoff. Returns top 5 results with title, URL, and snippet. Do NOT use for searching internal documents (use search_internal_docs instead) or for math (use calculator). Rate limited to 30 calls per minute."
Each parameter gets its own description. Tell the LLM what format, what valid values, any constraints.
{
"query": {
"type": "string",
"description": "Natural language search query. 3-10 words works best. Include specific entities and timeframes."
},
"recency": {
"type": "string",
"enum": ["day", "week", "month", "year"],
"description": "How recent results should be. Default is 'month'."
}
}
When you have multiple similar tools, each description must explicitly route the LLM to the right one. Without negative examples, the LLM picks more or less randomly when tools overlap.
Run your agent with intentionally ambiguous tasks and watch which tools it calls. If it's calling the wrong tool consistently, the descriptions need work, not the model.