Log file analysis

Every request to your server gets logged. Among those logs: every Googlebot visit. Log file analysis tells you what Google actually crawls (not what you think it crawls), where it spends time, what it skips, and what it's breaking on. It's the deepest diagnostic tool available to SEOs, and almost nobody uses it. This page walks through when log analysis is worth the effort, what you can learn, the tools, and the specific findings to act on.

The mindset

Search Console shows you Google's summary view of your site. Log analysis shows you the raw data. It's the difference between reading a book review and reading the book. For most small sites, the review is enough. For large or complex sites, you need the book.

When log analysis is worth it

What you can learn

Getting the logs

Depends on hosting:

Tools

Verifying Googlebot

Bad actors spoof the Googlebot user-agent. Verify by doing a reverse DNS lookup on the IP:

  1. Take the IP from the log
  2. nslookup [IP], should resolve to *.googlebot.com or *.google.com
  3. Then forward-lookup that hostname, should match the original IP

If the reverse doesn't match, it's not real Googlebot. Most log analysis tools automate this.

Key ratios to compute

Findings that matter

  1. Googlebot crawling stuff you don't care about, filter/sort/session parameter URLs. Handle with robots.txt or better URL hygiene.
  2. Googlebot NOT crawling stuff you do care about, low crawl frequency on money pages. Fix architecture or freshness signals.
  3. Crawl frequency drop on specific sections, can precede ranking drops. Early warning.
  4. High 404 rate, fix the links or delete the pages cleanly.
  5. Slow response time to bot, your LCP and crawl efficiency are both hurting.

Cadence

What to do with this

If your site has more than 10,000 URLs, log file analysis is probably worth setting up once. Use Screaming Frog Log File Analyser for a one-off look. See what Googlebot is actually doing. You'll usually spot at least three things that were invisible in Search Console.

Next: server response codes, how to read what your site is actually telling bots.