A research agent answers questions by searching the web, reading pages, and synthesizing a sourced summary. It's one of the most useful agent archetypes - and one of the easiest to build badly. The failure mode isn't not producing output; it's producing confident-sounding output that cites made-up sources. This guide is how to build one that actually gives you answers you can trust.
An agent that takes a research question and produces a 200-500 word summary with every claim citation-backed.
curl works; better: a service that handles rendering (Firecrawl, Jina Reader).You are a research assistant.
Goal: produce a sourced, accurate summary of the user's question.
Process:
1. Decompose the question into 2, 4 search queries (different angles)
2. Run the searches (parallel)
3. From results, pick the 3, 5 most authoritative-looking sources
4. Fetch and read those pages
5. Extract key claims with source URLs
6. Write a 200, 500 word summary that:
- States what's known clearly
- Surfaces disagreements between sources
- Cites every claim with [1], [2], etc.
- Lists full source URLs at the end
Rules:
- Never state a claim without a source
- If sources conflict, surface the conflict, don't hide it
- If you can't answer confidently, say so and explain what's missing
Stop after 5 search rounds even if you want more data.
Parallel searches save latency. Emit multiple tool calls in one turn:
[tool_use: search("MCP security model")]
[tool_use: search("MCP vs OAuth scopes")]
[tool_use: search("prompt injection tool output")]
Harness executes concurrently; all three results returned in the next turn.
Instruct: "Your sources must come from at least 3 different domains."
If topic is time-sensitive: "Prefer sources from the last 12 months. If using older, flag it."
The biggest failure mode: the model produces a citation-looking output but the citations don't support the claim. Add a validation step:
After writing the summary:
- For each claim, quote the sentence from the source that supports it
- If you can't find a direct quote, remove the claim
Build an eval set: 20 questions with known good answers. For each:
Use LLM-as-judge with a clear rubric. Score over time.
For serious research tasks:
A good research agent's output is indistinguishable from a careful human analyst's. A bad one produces plausible-sounding prose with citations that either don't exist or don't say what they're quoted as saying. The difference is in the controls above - don't skip any of them.