Home›Expertise›AI Agents›Parallel tool calls

Parallel tool calls

📖 3 min readUpdated 2026-04-19

Modern tool-use APIs let the model request several tool calls in a single turn. Your orchestrator runs them in parallel. You get back all the results at once. It's the cheapest, easiest latency win in an agent: zero extra cost, often 3-5× faster on turns that fetch independent data. If your agent does more than one tool call per turn, you probably want this on.

The mechanic

Instead of emitting one tool_use block, the model emits three. Your orchestrator sees the three calls, dispatches them concurrently (asyncio.gather, promise-all, thread pool, whatever you like), waits for all of them to finish, then packages all three results into a single response back to the model. The model reads them together and continues.

The latency math

At three calls the gap is ~1 second. At ten calls it's multiple seconds. Over an entire agent run with dozens of parallelizable moments, you routinely shave half the wall-clock. Users notice.

When parallel is the right shape

A worked example: customer lookup

User: "Give me the full picture on customer 1234."

Sequential version (slow):

Call get_customer(1234) → wait 400ms
Call get_orders(1234) → wait 500ms
Call get_tickets(1234) → wait 600ms

Total: 1500ms.

Parallel version: model emits all three tool_use blocks in one response. Orchestrator runs them concurrently. Wall time: max(400, 500, 600) = 600ms.

Same answer, less than half the wait, zero additional cost, same number of tokens sent to the model.

Implementation sketch

async def execute_tool_calls(calls):
    return await asyncio.gather(*[execute(c) for c in calls])

Five lines. That's most of what you need. The model already knows how to emit multiple tool calls when it makes sense; your orchestrator just has to dispatch them concurrently instead of sequentially.

Cost doesn't change

Parallel calls are the same number of tool invocations. Model tokens can even drop slightly because the orchestrator batches results into one response instead of sending three separate follow-up prompts. So: faster, same cost, occasionally cheaper. There's basically no downside.

Handling partial failures

Three calls, one fails. What do you do?

Default: wait for all three, return all three results (including the error) to the model. The model decides how to proceed. This is the simplest programming model and works for 95% of cases.
Fail-fast: abort the other calls as soon as one fails. Right when the calls are expensive or when a failure invalidates the others.
Partial results: return what succeeded, mark the failure. Good when the agent can often proceed with incomplete info.

Start with the default. It's predictable and the model handles mixed results well.

Getting the model to actually parallelize

Some models default to sequential tool use even when parallel would be cheaper. Prompt nudges that work:

"When you need multiple pieces of independent information, request them in one response."
"Prefer calling multiple tools in parallel when their inputs don't depend on each other."
An example in the system prompt showing the model emitting multiple tool_use blocks at once.

Pitfalls

Parallelizing dependent calls. If call 2 needs call 1's result, parallel gives you garbage. The model usually figures this out, but watch traces.
Rate limit collision. Three parallel calls to the same API can trip rate limits the sequential version wouldn't. Add per-provider concurrency limits.
Error swallowing. If your asyncio.gather swallows exceptions, the model never sees the failures. Use return_exceptions=True or structured errors.
Assuming all APIs support concurrent access. Some internal services don't. Test.

What to do with this

Turn on parallel dispatch in your orchestrator. Add a system-prompt nudge if your model doesn't use it naturally.
Read latency optimization for the bigger picture on cutting wall-clock.
Read tool error handling for the partial-failure patterns.