Claude is the model at the center of everything else on this site. Before you read about MCP or Claude Code or agent patterns, it helps to understand what Claude actually is, what makes it a good choice for agent work, and how to pick between the different Claude variants. This page is the short answer, with enough detail to make the rest of the framework make sense.
Claude is a family of large language models made by Anthropic. "Large language model" is a long phrase for what it is: software that, given a prompt, continues the text in a way that looks like what a smart human would write. That's the whole mechanism. Everything clever Claude does, answering questions, writing code, calling tools, reasoning through problems, is built on top of "predict the next likely piece of text."
What makes Claude specifically useful for autonomous AI is that Anthropic built it with agents in mind. Other LLMs were mostly designed for chat and bolted on tool use later. Claude has tool use, long context, and careful reasoning as first-class design goals from the start.
At any given generation, Claude ships as three tiers. They're the same model family with different sizes and speeds. Think of them like Large / Medium / Small.
The practical rule: default to Sonnet. It's a great general-purpose model for agent work. Reach for Opus when you're hitting the ceiling on hard reasoning. Drop to Haiku when you need to run many calls in parallel for worker tasks and the complexity is low.
A common pattern in multi-agent setups is "Opus plans, Haiku executes." The smart model makes a plan, then spawns Haiku sub-agents to carry out the easy steps in parallel. You get the quality of Opus and the speed/cost of Haiku.
Tool use is the big one. Agents live or die by how reliably they can call a sequence of tools across many turns. Claude is consistently the most stable in long agent loops; fewer dropped calls, fewer weird tool-argument mistakes.
System-prompt following matters because autonomous agents depend on you being able to put rules in the prompt and have them actually persist. Claude is noticeably better at keeping a complex set of rules in mind across a long conversation.
Long context means the window can hold a lot at once. At 1M tokens, you can fit an entire medium-sized codebase, a long session's worth of tool results, or a big document set. This changes what's practical.
Extended thinking lets the model reason privately before it answers. You give it a budget ("think for up to 10k tokens before responding") and it uses the space to work through the problem. See Extended thinking for when to use it.
Prompt caching makes long-running agents economically viable. When your system prompt is big (lots of tool descriptions, rules, context), each request normally pays for all of that. With caching, you pay once and get ~90% off for subsequent requests that share the prefix. See Prompt caching.
Safety tuning means Claude is trained to refuse clearly harmful tool calls even when the user asks for them, to flag prompt-injection attempts in tool output, and to be careful with irreversible actions. It's not bulletproof, but it's one more layer.
Honest take, without the vendor pitch:
Vendor lock-in is a real thing, though. If you're building for the long term, prefer abstractions (OpenRouter, LiteLLM, or your own thin wrapper) that let you swap providers without rewriting your agent. You gain resilience and negotiation leverage without giving up the ability to use the best model per task.
You'll encounter Claude through three different surfaces, and it's worth knowing when to use which:
Claude is priced per token, input and output counted separately. A few things to know:
For a well-designed agent with caching + the right model per task, the economics usually work out much better than you'd expect. The raw per-token prices look high; once you account for caching and tier selection, a typical daily-running agent costs single-digit dollars a month.
Andrej Karpathy - State of GPT (Microsoft Build)