Parallel tool calls

Modern tool-use APIs let the model request several tool calls in a single turn. Your orchestrator runs them in parallel. You get back all the results at once. It's the cheapest, easiest latency win in an agent: zero extra cost, often 3-5× faster on turns that fetch independent data. If your agent does more than one tool call per turn, you probably want this on.

The mechanic

Instead of emitting one tool_use block, the model emits three. Your orchestrator sees the three calls, dispatches them concurrently (asyncio.gather, promise-all, thread pool, whatever you like), waits for all of them to finish, then packages all three results into a single response back to the model. The model reads them together and continues.

The latency math

At three calls the gap is ~1 second. At ten calls it's multiple seconds. Over an entire agent run with dozens of parallelizable moments, you routinely shave half the wall-clock. Users notice.

When parallel is the right shape

A worked example: customer lookup

User: "Give me the full picture on customer 1234."

Sequential version (slow):

  1. Call get_customer(1234) → wait 400ms
  2. Call get_orders(1234) → wait 500ms
  3. Call get_tickets(1234) → wait 600ms

Total: 1500ms.

Parallel version: model emits all three tool_use blocks in one response. Orchestrator runs them concurrently. Wall time: max(400, 500, 600) = 600ms.

Same answer, less than half the wait, zero additional cost, same number of tokens sent to the model.

Implementation sketch

async def execute_tool_calls(calls):
    return await asyncio.gather(*[execute(c) for c in calls])

Five lines. That's most of what you need. The model already knows how to emit multiple tool calls when it makes sense; your orchestrator just has to dispatch them concurrently instead of sequentially.

Cost doesn't change

Parallel calls are the same number of tool invocations. Model tokens can even drop slightly because the orchestrator batches results into one response instead of sending three separate follow-up prompts. So: faster, same cost, occasionally cheaper. There's basically no downside.

Handling partial failures

Three calls, one fails. What do you do?

Start with the default. It's predictable and the model handles mixed results well.

Getting the model to actually parallelize

Some models default to sequential tool use even when parallel would be cheaper. Prompt nudges that work:

Pitfalls

What to do with this