XML Sitemap Analyzer

Point it at a sitemap index or a plain URL sitemap, sample the child files, flag hygiene problems and run a live status check on submitted URLs.

This XML sitemap analyzer points at a sitemap index or a plain URL sitemap and tells you what really matters: did crawlers find your pages, and are those pages worth indexing. It fetches the root server side to dodge CORS, works out whether it is an index or a flat urlset, then samples the child files and parses them. It keeps the submitted URLs right in front of you and flags the usual suspects: non HTTPS rows, a hostname that drifts, duplicate locations, missing lastmod. Then it runs a live HTTP status check on a handful of URLs, because a 404 loves to hide inside markup that parses cleanly. It is the tool I open right after a bulk publish, a permalink change, an SEO plugin swap or a migration.

Queries run through the PeopleAreGeek lookup service. We log nothing.

XML sitemap discovery and quality audit

Point it at a sitemap index or a plain URL sitemap. It samples the child files, eyeballs the URLs for the usual hygiene problems, then runs a live status check on a handful of submitted URLs. What you walk away with is a report that keeps two things apart: did crawlers find your pages, and are those pages actually worth indexing.

Sitemap URL or domain

Child sitemap sample

URL status audit

It samples child sitemaps and URL rows so the whole thing stays snappy in the browser. A sitemap helps with discovery, sure. But status codes, the canonical, robots rules, content quality? Those are what really decide if a submitted URL deserves to rank.

What a sitemap analyzer should tell you

This XML sitemap analyzer clears something up first. A sitemap won't rank you, and it doesn't replace good internal linking. It's a map for discovery, nothing more. So a decent analyzer should answer the boring practical stuff before it dumps a giant URL list on you. Can the file even be fetched? Is it an index or a flat URL sitemap? Do the child files parse? Which URLs are you submitting, and honestly, do they look like the canonical public pages you actually want crawled, or did some junk sneak in?

That review is the whole point of this tool. It starts wherever you tell it to, samples the child files if the root is an index, and keeps the submitted URLs right in front of you. Along the way it flags the usual suspects: HTTPS, whether the hostname stays consistent, duplicate rows, missing lastmod. Then it pokes a small set of URLs for their live HTTP status. Beats squinting at raw XML, especially right after you've pushed a batch of WordPress posts or fiddled with an SEO plugin setting and you just want to know nothing broke.

Sitemap index and child sitemaps are different layers

Most WordPress SEO plugins hand you one sitemap index plus a bunch of child sitemaps. Think of the index as a directory, a table of contents. The child files hold the real rows: posts, pages, categories, authors, images. Submit one child file by accident and you're auditing a thin slice of the site without realizing it. Worse, a child sitemap can quietly stop updating while the root index still looks perfectly fine at a glance. So I'd read both layers. It's the safer habit, and it takes ten extra seconds.

Root sitemap tells you if the public entry point even loads, and what kind of XML it's serving.
Child sitemap sample shows which sections actually exist, and whether the sampled files parse without choking.
URL sample shows the exact locations you're submitting, plus whatever lastmod values are sitting in the XML.
Status audit hits a live sample, because a 404 or a 5xx loves to hide inside markup that parses cleanly.
Action guide keeps the discovery problems in one pile and the page-level indexability stuff in another.

How to judge a submitted URL

Every URL in a sitemap should really be the preferred, public, canonical version of that page. Right protocol, right host, a healthy status code, no accidental twins, and it should line up with how you link to it internally. Where it goes wrong: an old URL that now redirects, a private section that leaked in, a noindex archive you forgot to exclude, a deleted page coughing up a 404. The XML can be technically valid through all of that. The search signal underneath is still a mess.

Lastmod is one to read carefully. A missing value doesn't mean your sitemap is broken, despite what some tools imply. A wrong value is arguably worse than none at all. The way I think about it: lastmod is a change signal, and it should track real page updates, but only if your generator can actually produce honest dates. The moment a plugin bumps lastmod on every URL for every trivial event, the whole field turns to noise and you can't trust it mid-audit. Maybe that's just me, but I'd rather have no date than a fake one.

Useful WordPress sitemap checks

Did you just bulk-publish, shuffle content from pages to posts, rename categories, swap SEO plugins, rewrite permalinks, or clear the sitemap cache? Go check the root sitemap again. Confirm the post sitemap you expect is actually there, open a few sampled rows, and test a couple of your important new URLs directly. If a URL hasn't shown up yet, don't blame Search Console straight away. First make sure it's published, indexable, linked from a hub that matters, and that your plugin rules even include it.

Sitemaps and Google Search Console

If your site has a pile of child sitemaps, just submit the index. Cleanest starting point. Search Console will then report fetch problems and discovered URLs over the following days, which is great, but it's slow. A local audit still earns its keep because it catches the dumb stuff right now: wrong host, a stale child file, surprise 404 rows, non-HTTPS output after a migration, or a sitemap that stopped parsing the second a cache or plugin change went live. Why wait for a crawler to tell you what you can see in five seconds. The sitemaps.org protocol and the Google Search Central docs are the references worth keeping handy.

A practical sitemap workflow

Run the analyzer on the sitemap index you submitted for your canonical host.
Skim the child sample and make sure the sections you expected to see are actually there.
Go through the URL rows. Protocol, hostname, duplicates, lastmod patterns, anything that shouldn't be in there at all.
Status-check the URLs that matter before you nag a crawler to come back.
Then pair all of this with robots, indexability and canonical checks on your important pages. Internal links too.

Frequently asked questions

Does a sitemap guarantee indexing?

Nope. It helps with discovery and that is where it stops. The page itself still has to be reachable, technically indexable, genuinely useful and connected to the rest of your site. Google also has to decide it is worth keeping around.

Should redirected URLs stay in a sitemap?

Usually not, assuming you control the sitemap. Point it at the final canonical destination instead. Redirects still earn their place for old inbound links and migrations, but your sitemap is meant to describe the URLs you want found right now, not the ones that used to exist.

Can a sitemap be valid but still low quality?

Absolutely. The XML can parse without a single error while it is busy submitting thin pages, duplicate archives and dead URLs. Some of those pages might even be blocked by other signals entirely. That is exactly why an audit has to look at the structure and spot-check the actual URLs, not one or the other.

How many URLs can one sitemap hold?

The cap is 50000 URLs, or 50 MB uncompressed, whichever you hit first. Go past either limit and you will need to split the thing into several sitemaps, then list them all in a sitemap index file. Honestly most sites never get close.