An XML sitemap is a file that hands Google the list of URLs you want indexed. It's a hint, not a command. Google doesn't have to index everything in it. But a clean, accurate sitemap speeds up discovery, helps Google find new and updated pages faster, and is essential on any site over a couple hundred pages. This page walks through what belongs in a sitemap, what doesn't, and the common mistakes that turn a sitemap from useful tool into noise.
Googlebot discovers pages by following links. A sitemap is a shortcut. Instead of Google having to crawl its way to every page through links alone, it can look at your sitemap and know immediately what exists. That's especially valuable for new sites, large sites, pages with few internal links, or pages that just got updated.
Think of a sitemap as Google's cheat sheet. It doesn't replace good internal linking. It complements it.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page1</loc>
<lastmod>2026-04-18</lastmod>
</url>
...
</urlset>
?sort=, ?filter=)Including garbage URLs sends mixed signals and wastes crawl budget.
Over these limits, split into multiple sitemaps plus a sitemap index file.
For sites with more than 50,000 URLs, create a sitemap index that points to child sitemaps:
<?xml version="1.0"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap><loc>https://example.com/sitemap-posts.xml</loc></sitemap>
<sitemap><loc>https://example.com/sitemap-products.xml</loc></sitemap>
</sitemapindex>
Each child sitemap holds up to 50,000 URLs. You can have up to 50,000 child sitemaps. Plenty of room.
https://yourdomain.com/sitemap.xml (standard location)robots.txt: Sitemap: https://yourdomain.com/sitemap.xmlOn CMS-driven sites (WordPress, Shopify, Webflow), the sitemap should regenerate automatically when content changes. Most modern CMSes handle this via built-in features or plugins (Yoast, RankMath on WordPress).
If you're hand-coding a site, automate the sitemap in your build pipeline. A stale sitemap is worse than no sitemap.
Small sites (under 100 pages) with good internal linking don't strictly need a sitemap. Google will find everything anyway. But there's no downside to having one, and it's cheap to set up. Do it.
Check your sitemap URL right now. Open yourdomain.com/sitemap.xml in a browser. Does it exist? Is it current? Does it only include canonical, indexable URLs? If any answer is no, fix it this week.
Next: robots.txt, the file that tells bots which URLs they're allowed to crawl.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page1</loc>
<lastmod>2026-04-18</lastmod>
<priority>0.8</priority>
</url>
...
</urlset>
<loc>, required; absolute URL<lastmod>, optional but useful; tells Google which pages changed<changefreq>, essentially ignored by Google; don't bother<priority>, also ignored; don't waste time settingIncluding garbage URLs sends mixed signals and wastes crawl budget.
Over these limits, split into multiple sitemaps + use a sitemap index file.
For large sites, create a sitemap index that lists child sitemaps:
<?xml version="1.0"?>
<sitemapindex xmlns="...">
<sitemap><loc>https://example.com/sitemap-posts.xml</loc></sitemap>
<sitemap><loc>https://example.com/sitemap-products.xml</loc></sitemap>
</sitemapindex>
https://yourdomain.com/sitemap.xml (standard location)robots.txt: Sitemap: https://yourdomain.com/sitemap.xmlOn CMS-driven sites (WordPress, Shopify, etc.), the sitemap should update automatically when content changes. Most modern CMSes handle this via plugins (Yoast, RankMath) or built-in features.
Small sites (<100 pages) with good internal linking don't strictly need a sitemap. Google will find everything anyway. But there's no cost to including one, so always do.