Log analysis tools

Log file analysis shows what Googlebot actually crawls, how often, where it goes, where it fails. For large sites, it's the only way to see crawl budget in action. The tools do the heavy lifting.

Why logs matter

GSC shows you what Google reports it crawls. Server logs show what actually happened. The two don't always match. Log analysis catches:

Major tools

Screaming Frog Log File Analyser

Desktop app (separate from the Screaming Frog crawler). Imports log files, joins with crawl data, visualizes.

OnCrawl

Cloud platform. Integrates log analysis with crawl data + Analytics + GSC.

Botify

Enterprise platform. Similar to OnCrawl but aimed at very large sites.

Semrush Log File Analyzer

Bundled with Semrush subscriptions. Basic log analysis.

Custom (Python / ELK stack)

For teams with engineering resources, custom pipelines using Python, Elasticsearch + Kibana, or BigQuery provide ultimate flexibility.

What to look for

Crawl frequency per URL type

For each section of site (blog, category, product pages, legal), how often does Googlebot visit? High-value pages should be visited often; low-value should be visited less.

Crawl-to-index ratio

Of URLs Googlebot crawls, how many end up indexed? Ratio <50% = quality issues.

404 rate from Googlebot

Percent of Googlebot requests returning 404. Should be <1%. Higher = broken links to be fixed.

5xx rate

Should be near 0. Any consistent 5xx traffic indicates server reliability problems.

Average response time to Googlebot

Should be <500ms. Slow response = crawl budget wasted, rankings can suffer.

Redirect chains

Any URL where Googlebot follows 3+ redirects is a chain. Clean up.

Orphan URLs in logs

URLs Googlebot crawls that aren't internally linked anymore. Often old URLs from a prior version of the site.

URLs never crawled

In your sitemap but Googlebot never visited. Crawl budget or discoverability issue.

Getting the logs

Depends on hosting:

Verifying Googlebot

Not all "Googlebot" in logs is real. Verify via reverse DNS:

  1. Take IP from log
  2. Reverse DNS lookup → hostname should end with googlebot.com or google.com
  3. Forward DNS on that hostname → should match original IP

All the mentioned tools automate this.

Cadence

After major changes (migrations, redesigns), daily for 2-4 weeks.

When logs save you

Real scenarios where log analysis surfaced issues invisible elsewhere: