Indexability Checker

Paste a URL and see if Google can index it: HTTP status, meta robots, X-Robots-Tag, robots.txt and the canonical, with the blocker named.

An indexability checker reads the technical signals that decide whether Google can index a URL, then tells you the exact reason a page is blocked. Paste a live page and it pulls the HTTP status, the robots meta, the X-Robots-Tag header, the canonical and a robots.txt check on the very first request, server-side, so you act on what the server is really sending instead of a cached guess. This is the tool I open first when a page quietly drops out of the results, because the cause is usually a stray noindex or a leftover disallow nobody remembers adding, not the content. It scores the page, flags the one signal that matters most, and lays the crawl, canonical and raw response side by side. Indexable does not mean indexed, so once the technical layer is clean I go fix the content and the internal links.

Queries run through the PeopleAreGeek lookup service. We log nothing.

Indexability Checker: Status, Robots.txt, Noindex, Canonical, Sitemap and Crawl Signals

Pages drop out of Google for the dumbest reasons. A stray noindex nobody remembers adding. Or some robots.txt rule a contractor left behind in 2019. So I throw a live URL in here and it pulls the HTTP status, the robots meta, the X-Robots header, the canonical, plus a robots.txt check, all on the first request. One look and you can see if something is quietly blocking the page.

What an indexability checker does

An indexability checker reads the technical signals that decide whether Google can index a URL, then names the one that is blocking it. Paste a page and it pulls the HTTP status, the robots meta, the X-Robots-Tag header, the canonical and a robots.txt check on the very first request, so you act on what the server is actually sending rather than what you assume is there. This is the tool I open first when a page quietly falls out of the results, because most of the time the cause is a leftover directive nobody remembers adding, not the content.

Indexable does not mean indexed

This checker only hunts for the technical blockers. Your page can be crawlable and indexable and Google still leaves it out, because the content is thin or nothing links to it. I run this first to clear the technical suspects. Then I go fix the content and the internal links. Honestly, that is where the real problem usually hides.

Signals checked

  • The HTTP status. You want a clean 200 on a page meant for the index. A redirect or a 404 sitting here is a red flag.
  • The robots meta and the X-Robots-Tag header. Neither should say noindex. That header gets forgotten constantly, since it lives in the response, not the HTML.
  • Robots.txt. It should not block the path for Googlebot. One sloppy disallow can wipe out a whole section of the site.
  • The canonical. It has to point at the version you actually want indexed, not some duplicate or a URL dragging query parameters around.
  • Discoverability. Google has to find the page somehow first, whether that is your sitemap or a real internal link pointing at it.

How I read the result

I check the status and the noindex flags first, because a 200 with no noindex is the floor you need before anything else matters. Then the robots.txt verdict, since a broad disallow can block a whole directory in one line. Then the canonical, to be sure the page is pointing at itself and not at a duplicate or a parameter-laden copy. When everything here is clean and the page is still missing, I stop blaming the technical layer and go look at the content and the internal links, then confirm the coverage state in Search Console URL Inspection.

Frequently asked questions

What makes a page non-indexable?

A handful of usual suspects. A noindex (meta robots or the X-Robots-Tag header), an HTTP status that is not 200, a canonical pointing off somewhere else, or a login wall. Here is the one that trips people up. A robots.txt disallow blocks crawling, which is not the same as blocking indexing. Different problem entirely, so I treat it as its own signal.

Does a robots.txt disallow remove a page from Google?

No, and that answer surprises people constantly. Disallow stops the crawl. But if other pages link to that URL, Google can still list it, just with no snippet, that sad no information is available blurb. So if you genuinely want it gone, do the opposite of what feels right. Allow the crawl so Googlebot can reach the page, then serve a noindex. You have to let it in before it will agree to leave.

What is the difference between noindex and canonical?

They feel similar. They are really not. Noindex is a flat no, keep this page out of the index, full stop. A canonical is softer, just a hint that says these pages are basically the same, treat this one as the master copy. With a canonical the page still gets crawled and can still surface. So my rule: noindex when I want a page gone, canonical when I have near-duplicates and only need Google to pick a winner.

Why is my page indexable but still not indexed?

Indexable just means nothing is actively blocking it. That is the floor, not a promise. Google still gets the final say. It weighs whether it even found the page, whether the crawl is worth the budget, and whether the content holds up against the near-duplicate it might already have. When I want the actual answer instead of guessing, I drop the URL into Search Console URL Inspection. It shows you the exact coverage state straight from Google.

Does this tool render JavaScript?

It does not. It reads the served HTML and the response headers, exactly what comes back on that first request. Which matters more than it sounds. If JavaScript injects your meta robots or canonical after load, what Google eventually renders can drift from what you see here. So when a page leans on JS for that stuff, confirm the final state in Search Console URL Inspection, because that one actually renders the page the way Google does.