Prompt caching

Prompt caching lets the API reuse pre-computed attention state across requests that share a prompt prefix. For agents with stable system prompts, it's a 5–10× cost cut and material latency reduction.

How it works

You mark a portion of the prompt as cacheable. The first request computes and caches it. Subsequent requests that send the exact same prefix hit the cache and pay ~10% of the normal input cost for that prefix.

Claude's prompt cache has a 5-minute TTL by default. Each request that hits the cache refreshes it. So an agent making frequent calls keeps the cache warm indefinitely.

Why it matters for agents

Agents repeat prompts. A typical agent call looks like:

[ system prompt 5k tokens ]  ← same every call
[ tool definitions 2k tokens ] ← same every call
[ conversation so far 10k tokens ] ← grows each turn
[ new message ]

Without caching, every call pays full price for the 7k stable part. With caching, you pay ~700 tokens' worth.

Structuring prompts for cache hits

Put the stable content first. Claude's cache matches by exact prefix. Anything that changes invalidates the cache from that point forward.

GOOD order:
  1. System prompt       [cache-ok, stable]
  2. Tool definitions    [cache-ok, stable]
  3. Long context docs   [cache-ok, stable-ish]
  4. Conversation history [changes each turn]
  5. New user message    [changes each turn]

BAD order:
  1. System prompt
  2. User message          ← invalidates cache
  3. Tool definitions

Cache breakpoints

You can mark specific points as cache breakpoints. Each breakpoint is a separate cache block. The first N blocks (up to the limit, typically 4) are cached independently.

Use breakpoints to cache different stability tiers:

Cost math

At Claude Sonnet pricing (directional):

For an agent making 100 calls with a 10k-token stable prefix:

~90% savings. On agents with long system prompts or large retrieved context, it's substantial.

Latency wins

Cached reads are faster than fresh computation. Typical TTFT (time to first token) on a long cached prompt is ~40% lower.

When caching doesn't help

Gotchas

Bottom line: if you have a stable system prompt and an agent making frequent calls, prompt caching isn't optional. It's operational hygiene.