Extended thinking

Extended thinking lets Claude reason internally before answering. More reasoning often means better answers on hard tasks, but not always, and not cheaply.

What it is

An API feature where the model spends a configurable number of tokens "thinking" (producing internal reasoning) before generating its final response. The reasoning isn't shown to the user by default but counts toward the cost.

You set a budget, e.g. 2,000 thinking tokens, and the model uses up to that amount, depending on task complexity.

When it helps

When it doesn't help

Cost tradeoffs

Thinking tokens are typically priced at the output rate (the expensive one). A 2k-token reasoning budget + 500-token response = 2,500 output tokens billed. At Sonnet-class pricing, that's ~1ยข per request. Multiply by millions and it adds up fast.

Insight: Use extended thinking adaptively. Simple user requests โ†’ no thinking. Complex planning steps โ†’ 1kโ€“2k thinking. Super-hard puzzles โ†’ 4k+.

Budgeting in practice

Start at 1,024 tokens. Measure accuracy with vs without. If the lift is <5% on your eval, turn it off. If it's 20%+, try doubling.

Observable patterns

Combining with tool use

Extended thinking pairs well with tool-use: model reasons about what to call, calls it, then reasons about the result. On complex tool-use flows (agent picking between 5 tools), enabling thinking noticeably improves tool-selection accuracy.