Parallel tool calls

Modern tool-use APIs (Claude, GPT-4, Gemini) let the LLM request multiple tool calls in a single response. The orchestrator runs them in parallel and feeds back all results. This dramatically cuts latency for independent tasks.

When parallel makes sense

When tool calls are independent: don't depend on each other's results.

When NOT parallel

When each call depends on the prior result:

These must be sequential.

The latency math

Three tool calls, each 500ms:

At agent scale (many tool calls), the savings compound.

Implementation

The LLM returns multiple tool_use blocks in one response. Your orchestrator must detect this and dispatch them concurrently:

async def execute_tool_calls(calls):
    return await asyncio.gather(*[execute(c) for c in calls])

Cost doesn't change

Parallel calls don't cost more than sequential, same number of tool invocations. Sometimes cost is slightly lower because the orchestrator only sends one LLM prompt back with batched results.

Error handling in parallel

If one of three parallel calls fails, do you wait for the others? Fail fast? Return partial results?

Typically: wait for all to complete, return results (including any errors) to the LLM. LLM decides how to proceed. This keeps the programming model simple.

When the LLM doesn't parallelize naturally

Some models default to sequential tool use even when parallel would be faster. Prompt engineering can nudge them: "When you need multiple pieces of independent information, request them all in one response."