Home›Expertise›SEO›Log file analysis

Log file analysis

📖 8 min readUpdated 2026-04-19

Every request to your server gets logged. Among those logs: every Googlebot visit. Log file analysis tells you what Google actually crawls (not what you think it crawls), where it spends time, what it skips, and what it's breaking on. It's the deepest diagnostic tool available to SEOs, and almost nobody uses it. This page walks through when log analysis is worth the effort, what you can learn, the tools, and the specific findings to act on.

The mindset

Search Console shows you Google's summary view of your site. Log analysis shows you the raw data. It's the difference between reading a book review and reading the book. For most small sites, the review is enough. For large or complex sites, you need the book.

When log analysis is worth it

Sites >10k URLs (crawl budget matters)
E-commerce sites (parameter handling issues)
After a major migration or site restructure
When Search Console shows unexplained indexing issues
For enterprise SEO teams building data-driven strategies

What you can learn

What Googlebot crawls daily, compared to your sitemap, total URLs
Crawl frequency per page, high-value pages should be crawled often; low-value rarely
Response codes Googlebot hits. 404s, 500s, slow responses
Redirect chains, chains of 301→301→301 are crawl-budget wasters
URL parameters being crawled unnecessarily, filter, sort, session parameters
Orphan pages Google found via sitemap but you don't link to
Bot verification, are requests claiming to be Googlebot actually Googlebot? (Many scrapers spoof.)

Getting the logs

Depends on hosting:

Cloudflare → Logs in the dashboard (Enterprise plan)
AWS CloudFront → CloudWatch or S3 logs
Nginx/Apache → standard access log files
Shared hosting → cPanel often has raw logs
CDN-fronted → your origin logs may not capture cached requests; check CDN logs

Tools

Screaming Frog Log File Analyser, desktop app, imports raw logs, filters by user-agent
OnCrawl. SaaS, integrates logs with crawl data
Botify, enterprise-grade log analysis + crawl data integration
Custom with Python/ELK, for teams with engineering resources

Verifying Googlebot

Bad actors spoof the Googlebot user-agent. Verify by doing a reverse DNS lookup on the IP:

Take the IP from the log
nslookup [IP], should resolve to *.googlebot.com or *.google.com
Then forward-lookup that hostname, should match the original IP

If the reverse doesn't match, it's not real Googlebot. Most log analysis tools automate this.

Key ratios to compute

Crawl budget per URL type: for each section of site (blog, category, products), how often does Googlebot visit?
Crawl-to-index ratio: of all URLs crawled, how many end up indexed? Low ratio = quality issues.
Response code distribution: >1% 404s or 5xxs on Googlebot traffic = problem.
Average response time to Googlebot: slow = crawl budget wasted.

Findings that matter

Googlebot crawling stuff you don't care about, filter/sort/session parameter URLs. Handle with robots.txt or better URL hygiene.
Googlebot NOT crawling stuff you do care about, low crawl frequency on money pages. Fix architecture or freshness signals.
Crawl frequency drop on specific sections, can precede ranking drops. Early warning.
High 404 rate, fix the links or delete the pages cleanly.
Slow response time to bot, your LCP and crawl efficiency are both hurting.

Cadence

What to do with this

If your site has more than 10,000 URLs, log file analysis is probably worth setting up once. Use Screaming Frog Log File Analyser for a one-off look. See what Googlebot is actually doing. You'll usually spot at least three things that were invisible in Search Console.

Next: server response codes, how to read what your site is actually telling bots.