Modern tool-use APIs let the model request several tool calls in a single turn. Your orchestrator runs them in parallel. You get back all the results at once. It's the cheapest, easiest latency win in an agent: zero extra cost, often 3-5× faster on turns that fetch independent data. If your agent does more than one tool call per turn, you probably want this on.
Instead of emitting one tool_use block, the model emits three. Your orchestrator sees the three calls, dispatches them concurrently (asyncio.gather, promise-all, thread pool, whatever you like), waits for all of them to finish, then packages all three results into a single response back to the model. The model reads them together and continues.
At three calls the gap is ~1 second. At ten calls it's multiple seconds. Over an entire agent run with dozens of parallelizable moments, you routinely shave half the wall-clock. Users notice.
User: "Give me the full picture on customer 1234."
Sequential version (slow):
get_customer(1234) → wait 400msget_orders(1234) → wait 500msget_tickets(1234) → wait 600msTotal: 1500ms.
Parallel version: model emits all three tool_use blocks in one response. Orchestrator runs them concurrently. Wall time: max(400, 500, 600) = 600ms.
Same answer, less than half the wait, zero additional cost, same number of tokens sent to the model.
async def execute_tool_calls(calls):
return await asyncio.gather(*[execute(c) for c in calls])
Five lines. That's most of what you need. The model already knows how to emit multiple tool calls when it makes sense; your orchestrator just has to dispatch them concurrently instead of sequentially.
Parallel calls are the same number of tool invocations. Model tokens can even drop slightly because the orchestrator batches results into one response instead of sending three separate follow-up prompts. So: faster, same cost, occasionally cheaper. There's basically no downside.
Three calls, one fails. What do you do?
Start with the default. It's predictable and the model handles mixed results well.
Some models default to sequential tool use even when parallel would be cheaper. Prompt nudges that work:
asyncio.gather swallows exceptions, the model never sees the failures. Use return_exceptions=True or structured errors.