Robots Meta Checker
Read the robots meta tags and X-Robots-Tag header on any URL, line up the crawler-specific directives, and keep index control separate from snippet control.
A robots meta checker reads the index and preview directives that are live on a public URL right now, so you act on the real signals instead of a cached guess. Paste a URL and it reads the generic robots meta tag, the crawler-specific googlebot and bingbot tags when it can pull the HTML, and the X-Robots-Tag header, then lines them up side by side and grades them against the page role you pick. It keeps three decisions apart that people blur constantly: noindex controls whether a URL can rank, nofollow changes how links get handled, and the preview directives only trim what shows around your result. For PDFs and images with no HTML head, the header is the only knob, so those get reviewed the same way. There is a plain report you can copy before any change hits a production template.
Queries run through the PeopleAreGeek lookup service. We log nothing.
Index control and preview directive audit
Paste a public URL. This reads the robots meta tags and the X-Robots-Tag header, lines up the generic directives against the crawler-specific ones, and keeps index control separate from snippet control (people blur those two constantly). Files with no HTML, your PDFs mostly, get reviewed the same way.
Meta tags only live in HTML. The X-Robots-Tag header works on either kind of response, HTML or not, which is the only way to reach files that cannot hold a meta tag in the first place.
What robots meta and X-Robots-Tag actually decide
A robots meta checker reads the index and preview directives that are live on a public URL right now, so you act on the real signals instead of guessing. Most people flatten robots directives down to one word. Indexable, or blocked. That is too coarse for a real audit. A noindex signal decides whether a fetched URL is allowed to show up in results at all. nofollow is a different thing entirely, it is about how links get treated. And the preview directives, max-snippet or max-image-preview or nosnippet, only govern how much a search engine shows around your result. Three separate decisions wearing the same coat.
So this checker reads the generic robots meta tag, then the crawler-specific ones if it can actually pull the HTML, plus the X-Robots-Tag header. On WordPress that combination bites people. Your SEO plugin writes page-level tags into the HTML, and meanwhile the host or a CDN quietly bolts on a header you never asked for. For a PDF or an image it is worse, or rather, simpler in a frustrating way: headers are basically the only knob you have got.
Robots meta is not robots.txt
Here is the trap. A robots.txt rule gates crawl access to a path. A page-level directive, though, has to be fetched first, a crawler cannot read a tag on a page it was told never to open. So if you block a URL in robots.txt and then slap a noindex on it, that noindex might never get seen. Honestly that is the mistake I see most. Moved something? Read the redirects. Got duplicate public content floating around? Read the canonicals. And for anything you genuinely want kept out of search, put the index-control directive somewhere a crawler can actually reach it.
- robots is your generic HTML meta directive, the catch-all every crawler reads.
- googlebot and bingbot let you override that catch-all for one named crawler when you need to.
- X-Robots-Tag rides in the HTTP headers, and it is what saves you on non-HTML responses.
- Snippet directives trim the preview without necessarily pulling the URL out of search. Worth remembering.
- Expected outcome tells the tool a deliberate noindex is not a bug, so it will not grade your private page like a screwup.
A practical robots directive workflow
- Check the exact public URL that actually showed up, the one from the sitemap or the Search Console report or wherever the ticket pointed you.
- Read the response status and the content type first. Before you touch the tags.
- Put the generic meta, the crawler-specific tags and the X-Robots-Tag header side by side and compare them together.
- When the signals disagree, pair your noindex finding with canonical, redirect and robots.txt checks. Do not trust one in isolation.
- Retest after anything changes: theme, SEO plugin, cache, CDN, a tweaked server header.
Frequently asked questions
Is nofollow the same as noindex?
No, and the mix-up costs people. Noindex is about whether the URL shows up in results. Nofollow just changes how links get handled. Blur the two and you'll spend a week chasing the wrong reason a page isn't performing.
Why check X-Robots-Tag on a PDF?
Because a PDF has no HTML head. There's nowhere to drop a robots meta tag. The response header is the only spot left to apply indexing or preview controls to that file, so that's where you look.
Does a missing robots meta tag mean a public page is broken?
No. A page indexes just fine with no robots meta tag at all, that is the default. What actually breaks things is a restrictive directive you did not expect, or control layers fighting each other, or a response path nothing can read.
What is the difference between the robots meta tag and robots.txt?
robots.txt handles crawling, site-wide, who is allowed to fetch what. The robots meta tag (and the X-Robots-Tag header) handles indexing, page by page. Catch is, the page has to be crawlable for anyone to read its noindex in the first place.
Can I set robots directives in an HTTP header?
Yep. The X-Robots-Tag header takes the same directives, and for non-HTML files like PDFs or images it's your only option. They've got nowhere to park a meta tag, so the header does all the work.