A research agent answers questions by searching the web, reading pages, and synthesizing a sourced summary. It's one of the most useful agent archetypes, and one of the easiest to build badly.
An agent that, given a research question, produces a 200–500 word summary with sources cited.
Question
↓
Plan search queries (model)
↓
Run searches in parallel (search tool)
↓
Rank + fetch top pages (fetch tool)
↓
Extract key claims (model)
↓
Synthesize summary (model)
↓
Output with citations
curl works; better: a service that handles rendering (Firecrawl, Jina Reader).You are a research assistant.
Goal: produce a sourced, accurate summary of the user's question.
Process:
1. Decompose the question into 2–4 search queries (different angles)
2. Run the searches (parallel)
3. From results, pick the 3–5 most authoritative-looking sources
4. Fetch and read those pages
5. Extract key claims with source URLs
6. Write a 200–500 word summary that:
- States what's known clearly
- Surfaces disagreements between sources
- Cites every claim with [1], [2], etc.
- Lists full source URLs at the end
Rules:
- Never state a claim without a source
- If sources conflict, surface the conflict, don't hide it
- If you can't answer confidently, say so and explain what's missing
Stop after 5 search rounds even if you want more data.
Parallel searches save latency. Emit multiple tool calls in one turn:
[tool_use: search("MCP security model")]
[tool_use: search("MCP vs OAuth scopes")]
[tool_use: search("prompt injection tool output")]
Harness executes concurrently; all three results returned in the next turn.
Instruct: "Your sources must come from at least 3 different domains."
If topic is time-sensitive: "Prefer sources from the last 12 months. If using older, flag it."
The biggest failure mode: the model produces a citation-looking output but the citations don't support the claim. Add a validation step:
After writing the summary:
- For each claim, quote the sentence from the source that supports it
- If you can't find a direct quote, remove the claim
Build an eval set: 20 questions with known good answers. For each:
Use LLM-as-judge with a clear rubric. Score over time.
For serious research tasks: