Browser agent

A browser agent drives a real browser: navigates URLs, clicks, types, scrolls, reads rendered content. Useful for tasks sites don't expose via APIs. Powerful. Slow. Fragile.

Use cases

Tooling

Vision vs DOM

Two control paradigms:

Hybrid systems use both.

Challenges

Page reliability

Modern sites are SPAs with async loading, anti-bot protections, captchas. Agents break often.

Latency

Each click takes seconds. Agent sessions measured in minutes, not seconds.

Cost

Vision LLM calls on screenshots are expensive. Heavy context.

Authentication

Logged-in scenarios need credential handling. Big security surface.

When to use

When to skip