Modern tool-use APIs (Claude, GPT-4, Gemini) let the LLM request multiple tool calls in a single response. The orchestrator runs them in parallel and feeds back all results. This dramatically cuts latency for independent tasks.
When tool calls are independent: don't depend on each other's results.
When each call depends on the prior result:
These must be sequential.
Three tool calls, each 500ms:
At agent scale (many tool calls), the savings compound.
The LLM returns multiple tool_use blocks in one response. Your orchestrator must detect this and dispatch them concurrently:
async def execute_tool_calls(calls):
return await asyncio.gather(*[execute(c) for c in calls])
Parallel calls don't cost more than sequential, same number of tool invocations. Sometimes cost is slightly lower because the orchestrator only sends one LLM prompt back with batched results.
If one of three parallel calls fails, do you wait for the others? Fail fast? Return partial results?
Typically: wait for all to complete, return results (including any errors) to the LLM. LLM decides how to proceed. This keeps the programming model simple.
Some models default to sequential tool use even when parallel would be faster. Prompt engineering can nudge them: "When you need multiple pieces of independent information, request them all in one response."