Tool use is how an agent acts. The model doesn't execute anything itself, it requests tool calls, the harness runs them, and the results become new input. Getting the mechanics right is half the battle.
You define a tool with three things:
web_search, read_file).The model decides when to call a tool based on the description. Description quality matters more than the name. A mediocre description = a tool the model won't use or will misuse.
1. Send messages + tool definitions to the model
2. Model responds with either:
- A final answer (done)
- A tool_use block (model wants to call a tool)
3. Harness executes the tool, captures output
4. Harness sends back a tool_result referencing the tool_use ID
5. Model produces the next step (more tool calls or final answer)
6. Repeat until final answer or max_turns reached
Modern Claude can request multiple tools in one turn. Your harness should execute them in parallel when possible:
// Model emits:
[tool_use: search("MCP spec")]
[tool_use: search("MCP vs API")]
[tool_use: fetch_url("https://modelcontextprotocol.io")]
// Harness runs all three concurrently,
// returns all three results in the next turn.
Parallel execution cuts latency dramatically on multi-search or multi-read tasks. Don't serialize by default.
Tools fail. The network times out. The file doesn't exist. Credentials expire. Your tool result should include error info in a shape the model can reason about:
{
"status": "error",
"error_type": "not_found",
"message": "File /path/to/file.txt does not exist",
"suggestion": "Check the path; list the directory first."
}
The model can read this and take a smarter next step (list the directory). If you return a raw stack trace, the model will often just retry the same bad call.
fetch_url for a query that needs search. Fix: sharpen descriptions, especially "when to use" and "when not to use."