Crawling + indexability
📖 4 min readUpdated 2026-04-18
The first two questions of technical SEO: can search engines find my pages? Can they index them? Until those answers are both "yes," no other SEO work matters.
Crawling
Search engine bots (Googlebot, Bingbot) discover pages by following links. A page is "crawlable" if a bot can reach it without being blocked.
Blockers to crawling
- robots.txt, explicitly denies access
- Authentication walls, page requires login
- No links to the page, orphan pages with no internal or external links
- Broken navigation. JavaScript menus bots can't parse
- Crawl budget exhaustion, on large sites, bots may run out of time before reaching all pages
Indexability
A page is "indexable" if, once crawled, the search engine adds it to its index (the database used to serve results).
Blockers to indexing
- noindex meta tag, tells search engines not to index
- canonical tag pointing to another URL, signals the other URL is the primary
- Duplicate content without clear canonicalization
- Thin content. Google may decline to index pages it considers too shallow
- 404 or error response, the page doesn't exist or failed to render
- Soft 404, page returns 200 OK but content is effectively "not found"
Debugging: is my page indexed?
- Search
site:yourdomain.com/specific-url in Google. If it appears, it's indexed. If not, keep debugging.
- Search Console → URL Inspection → enter the URL → check coverage status
- If "Discovered, not crawled" → Google knows about it but hasn't crawled. Wait, or request indexing.
- If "Crawled, not indexed" → Google crawled but chose not to index. Usually a quality or duplicate issue.
- If "Indexed, not submitted in sitemap" → works, but add to your sitemap.
Crawl budget
Google allocates a finite crawl budget per site, roughly how many pages/day it will crawl. For sites under a few thousand pages, this is never a concern. For large sites (10k+), poorly-optimized crawl budget means important pages get crawled less often.
Optimizing crawl budget:
- Don't waste crawls on low-value pages (noindex them)
- Fix 404s and redirect chains
- Keep sitemap clean and current
- Minimize low-value parameter URLs (?sort=, ?filter=)
Tools
- Google Search Console. URL Inspection, Index Coverage report
- Screaming Frog, crawl your site the way Google does, find issues
- Sitebulb, similar, with better visualizations
- Log file analyzers, see what Googlebot actually crawls (more on that later)
Quick wins
- Submit an XML sitemap to Google Search Console
- Link from homepage to every major category
- Fix or remove 404 links
- Audit for unintended noindex / canonical conflicts
- Check robots.txt isn't blocking anything important