Home›Expertise›SEO›Log analysis tools

Log analysis tools

📖 7 min readUpdated 2026-04-19

Log file analysis shows what Googlebot actually crawls, how often, where it goes, where it fails. For large sites, it's the only way to see crawl budget in action. The tools do the heavy lifting. This page walks through the major tools, their tradeoffs, and when log analysis is worth the setup.

Tool comparison

Why logs matter

GSC shows you what Google reports it crawls. Server logs show what actually happened. The two don't always match. Log analysis catches:

Pages Googlebot never crawls (orphan, deep, or blocked)
Pages Googlebot crawls too often (wasting budget)
Response codes Googlebot hits (404s, 500s)
Redirect chains
Slow response times specifically to Googlebot
User-agent spoofing (bots pretending to be Googlebot)

Major tools

Screaming Frog Log File Analyser

Desktop app (separate from the Screaming Frog crawler). Imports log files, joins with crawl data, visualizes.

Pros: one-time license (cheaper long-run), full control, no data sent externally
Cons: requires your own logs, manual import
Price: ~$150/year license

OnCrawl

Cloud platform. Integrates log analysis with crawl data + Analytics + GSC.

Pros: rich segmentation, cross-joins crawl + log + traffic data
Cons: expensive, requires log upload
Price: $$$, custom pricing

Botify

Enterprise platform. Similar to OnCrawl but aimed at very large sites.

Pros: scale, deep analytics, real-time log ingestion options
Cons: most expensive option
Price: enterprise-only, $$$$

Semrush Log File Analyzer

Bundled with Semrush subscriptions. Basic log analysis.

Pros: if you already have Semrush, no extra cost
Cons: less deep than dedicated tools

Custom (Python / ELK stack)

For teams with engineering resources, custom pipelines using Python, Elasticsearch + Kibana, or BigQuery provide ultimate flexibility.

Pros: unlimited customization, scales to any size
Cons: engineering overhead, no out-of-box reports

What to look for

Crawl frequency per URL type

For each section of site (blog, category, product pages, legal), how often does Googlebot visit? High-value pages should be visited often; low-value should be visited less.

Crawl-to-index ratio

Of URLs Googlebot crawls, how many end up indexed? Ratio <50% = quality issues.

404 rate from Googlebot

Percent of Googlebot requests returning 404. Should be <1%. Higher = broken links to be fixed.

5xx rate

Should be near 0. Any consistent 5xx traffic indicates server reliability problems.

Average response time to Googlebot

Should be <500ms. Slow response = crawl budget wasted, rankings can suffer.

Redirect chains

Any URL where Googlebot follows 3+ redirects is a chain. Clean up.

Orphan URLs in logs

URLs Googlebot crawls that aren't internally linked anymore. Often old URLs from a prior version of the site.

URLs never crawled

In your sitemap but Googlebot never visited. Crawl budget or discoverability issue.

Getting the logs

Depends on hosting:

Direct hosting (Nginx/Apache): access logs at /var/log/nginx/access.log or similar
Cloudflare: dashboard → Logs (Enterprise plan)
AWS CloudFront: S3 logs or CloudWatch
Vercel / Netlify: platform-specific log access
Managed WordPress / shared hosting: check cPanel for "Raw Access Logs"

Verifying Googlebot

Not all "Googlebot" in logs is real. Verify via reverse DNS:

Take IP from log
Reverse DNS lookup → hostname should end with googlebot.com or google.com
Forward DNS on that hostname → should match original IP

All the mentioned tools automate this.

Cadence

Small sites: quarterly review sufficient
Medium: monthly
Enterprise: weekly or continuous dashboard

After major changes (migrations, redesigns), daily for 2-4 weeks.

When logs save you

Real scenarios where log analysis surfaced issues invisible elsewhere:

Googlebot crawling the staging site in production after a deploy misconfiguration
Bot traps (infinite parameter URLs) absorbing all crawl budget
Parameter URLs being crawled 1000x more than canonical URLs
Key pages not being crawled because of buried nav after a redesign
5xx spikes during specific hours (server capacity issues)

What to do with this

If your site is over 10,000 URLs, run a one-off log analysis this quarter. Screaming Frog Log Analyser is the cheapest entry. You'll find at least one issue that GSC didn't show.

Next: SEO reporting, how to turn all this data into decisions.