Home›Framework›Build›Build a research agent

Build a research agent

📖 7 min readUpdated 2026-04-18

A research agent answers questions by searching the web, reading pages, and synthesizing a sourced summary. It's one of the most useful agent archetypes - and one of the easiest to build badly. The failure mode isn't not producing output; it's producing confident-sounding output that cites made-up sources. This guide is how to build one that actually gives you answers you can trust.

What you're building.

An agent that takes a research question and produces a 200-500 word summary with every claim citation-backed.

The architecture.

~ research agent pipeline ~

Tools you'll need

Web search. Brave Search, Tavily, or Exa via MCP. Tavily is tuned for AI; Brave is cheap.
Fetch URL. Pulls page content. Plain curl works; better: a service that handles rendering (Firecrawl, Jina Reader).
Optional: academic search. If your domain needs papers (Semantic Scholar MCP, arXiv).

The prompt structure

You are a research assistant.

Goal: produce a sourced, accurate summary of the user's question.

Process:
1. Decompose the question into 2, 4 search queries (different angles)
2. Run the searches (parallel)
3. From results, pick the 3, 5 most authoritative-looking sources
4. Fetch and read those pages
5. Extract key claims with source URLs
6. Write a 200, 500 word summary that:
   - States what's known clearly
   - Surfaces disagreements between sources
   - Cites every claim with [1], [2], etc.
   - Lists full source URLs at the end

Rules:
- Never state a claim without a source
- If sources conflict, surface the conflict, don't hide it
- If you can't answer confidently, say so and explain what's missing

Stop after 5 search rounds even if you want more data.

Why the prompt is structured this way

Decompose first. Parallel searches with diverse queries beats one big search.
Authoritative filter. Without this instruction, the model reads whatever ranks first. Prefer .edu, .gov, reputable publications.
Explicit stop. Without it, research agents keep searching. "More data = better answer" is wrong past a point.
Conflict surfacing. The biggest failure mode of research agents is averaging contradictory sources into a flat-sounding consensus that didn't exist.

Tool-use patterns

Parallel searches save latency. Emit multiple tool calls in one turn:

[tool_use: search("MCP security model")]
[tool_use: search("MCP vs OAuth scopes")]
[tool_use: search("prompt injection tool output")]

Harness executes concurrently; all three results returned in the next turn.

Quality controls

Source diversity

Instruct: "Your sources must come from at least 3 different domains."

Recency check

If topic is time-sensitive: "Prefer sources from the last 12 months. If using older, flag it."

Hallucination guard

The biggest failure mode: the model produces a citation-looking output but the citations don't support the claim. Add a validation step:

After writing the summary:
- For each claim, quote the sentence from the source that supports it
- If you can't find a direct quote, remove the claim

Evaluation

Build an eval set: 20 questions with known good answers. For each:

Does the summary cover the key points?
Are citations real (not hallucinated URLs)?
Do citations actually support the claims?
Is any major viewpoint missing?

Use LLM-as-judge with a clear rubric. Score over time.

Scaling up

For serious research tasks:

Add a planner that breaks big topics into sub-research tasks, each a sub-agent
Cache source content, repeat research on similar topics shouldn't re-fetch
Keep a "visited" list per session to avoid re-reading the same URL
For domains you research often (e.g., competitors), build a knowledge base you refresh weekly

What not to do.

~ the three ways research agents fail ~

A good research agent's output is indistinguishable from a careful human analyst's. A bad one produces plausible-sounding prose with citations that either don't exist or don't say what they're quoted as saying. The difference is in the controls above - don't skip any of them.