Prompting for agents

Prompting an agent is different from prompting a chatbot. A chatbot just has to answer; an agent has to decide, call tools, read the results, recover from errors, and stop when done. Your prompt isn't just instructions, it's the agent's entire operating manual. Get the shape right and the agent runs itself. Get it wrong and it'll either loop forever or give up at step 2. This page is the shape.

The shift in mindset.

When you prompt a chatbot, you're asking one question and getting one answer. The prompt can be loose, even lazy, because the human is in the loop to correct course.

When you prompt an agent, the human isn't in the loop anymore. The prompt has to anticipate: what tools will the model reach for? What if one of them fails? When does the task count as done? What format does the final answer need to be in? Every ambiguity becomes a bug at 3am. The prompt is the contract.

The four sections every agent prompt needs.

~ four sections, every agent prompt ~

1. Role and goal.

Open with what the agent is and what it's trying to accomplish. Specific beats generic.

You are a research assistant. Your goal is to answer the user's question
using web search and return a concise, sourced summary.

That's better than "help the user find information." Specific is kind to the model; it narrows the space of plausible behaviors.

2. Rules and constraints.

What the agent must always do, what it must never do, and any non-obvious edge cases.

Rules:
- Always cite sources with URLs
- Never speculate beyond what sources say
- If sources conflict, surface the disagreement, don't hide it
- Stop after 3 search rounds even if you want more data

Rules should resolve the small ambiguities that come up at runtime. "What if sources disagree?" is exactly the kind of thing you want answered in advance.

3. Tool-use guidance.

For each tool, say when to use it and when not to, and describe your error-recovery policy.

You have two tools:
- web_search: use for finding new information
- fetch_url: use to read a specific page the user references

If a tool returns an error:
- Retry once with a reformulated input
- If it errors twice, surface the failure to the user; don't hide it

Without this, the model invents its own error-recovery behavior. Sometimes that's fine. More often it hammers a failing endpoint fifty times.

4. Output format.

Describe exactly what the final answer should look like. If any code downstream is going to parse the output, be obsessive about this.

Return your final answer as a JSON object:
{
  "summary": "(200 words max)",
  "key_claims": [ { "claim": "...", "source_url": "..." }, ... ],
  "confidence": "high" | "medium" | "low"
}

Structure the prompt with XML tags.

Claude responds really well to XML-tagged sections. They help the model find specific parts of the prompt and produce outputs that match. This isn't just stylistic, it measurably improves consistency.

<task>Summarize the article at the URL below.</task>

<url>https://example.com/article</url>

<constraints>
- 200 words max
- Include the 3 key claims with source quotes
- Flag any statements that seem unverifiable
</constraints>

<output_format>
Markdown. H2 for the main summary, bullet list for claims.
</output_format>

The tags give the model landmarks. It knows "the task is in <task>" and "the constraints are in <constraints>." Output quality goes up; drift goes down.

Reasoning scaffolds. Force the model to think, not just react.

For anything complicated, explicitly tell the model to reason before it calls a tool, and after it sees the result. This works even without extended thinking mode turned on, the reasoning just shows up in the main output instead.

Before calling any tool:
1. Identify what's needed to complete the task
2. Decide which tool to call and with what arguments
3. Predict what the result will tell you

After each tool call:
1. Compare the result to what you expected
2. Decide whether to continue or change approach

This simple scaffold prevents a lot of the "agent got confused and called the wrong tool" failures. The model spending 30 tokens on "what do I actually need?" saves you from 10 bad tool calls.

Error recovery, made explicit.

Agents fail. Tools error. Networks are flaky. Without a plan for this, your agent either gives up too early or burns budget hammering a failing endpoint.

~ four error-recovery patterns ~
The #1 way agents burn money. Unbounded retry loops. Agent calls failing tool, gets error, calls same tool, gets same error, repeat. In 30 minutes you've spent $20 and made no progress. Every single tool call chain needs a hard cap.

Format spec goes near the end.

Something weird but true: Claude pays more attention to the last part of the prompt. If you care about output format, put the format spec after the rules and the tool guidance, not at the beginning. The model remembers the last instructions best, and format drift usually comes from the instructions being far from the output.

The five anti-patterns that will bite you.

A quick checklist before you ship any agent prompt.

  1. Role and goal stated in one sentence at the top?
  2. Rules numbered, non-conflicting, concrete?
  3. Each tool has a description, a "when to use" condition, and an error-handling rule?
  4. Explicit stop conditions present, with a max-turn cap?
  5. Output format specified near the end of the prompt?
  6. At least one worked example of good output?

Every yes is one less 3am debugging session.