AI Crawler Access Checker

When someone asks ChatGPT, Perplexity, or Claude about your topic, a crawler has to fetch your pages to build the answer. Three things quietly stop it: a robots.txt rule, a firewall that blocks the bot by name, or a server too slow before the fetcher gives up. Enter a domain and this tool checks it the way those crawlers see it, from outside your network, one user agent at a time, and shows you what’s getting through.

What this checks

  • Crawler reachability: sends the real user-agent strings for GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, and PerplexityBot, then compares them against a real browser, Googlebot, and an unknown bot. If the AI bots are refused but the controls get in, the block is targeting them by name.
  • robots.txt, bot by bot: reads your robots.txt and resolves the Allow/Disallow verdict for fourteen AI and agent user agents, including the training opt-out tokens (Google-Extended, Applebot-Extended).
  • Response speed (the 499 risk): times your server. Live fetchers abandon slow pages and log a 499, getting nothing. Sub-500ms is the target.
  • Edge and firewall: fingerprints the CDN/WAF (Cloudflare, Fastly, DataDome, Akamai, and others) from response headers and flags a bot challenge if one fires.
  • Burst test: fires several rapid requests from one IP to see whether a rate limiter locks out a crawler fetching pages quickly.

Most AI blocks are accidental

You can publish the best answer on the web, but if the crawler that assembles an AI response can’t fetch the page, you’re invisible in that answer. Blocks are rarely deliberate. A security plugin’s default rule set, an aggressive WAF, or a CDN bot filter refuses the AI user agents while real visitors never notice. Because the block is silent, most sites don’t find out until their content stops showing up in AI answers. This tool catches it in a few seconds.

How does this work?

Every result is a real request. The tool sends each crawler’s user agent to your homepage, reads your robots.txt, times the response, fingerprints the firewall, and runs a short burst. A real browser and Googlebot act as controls: if they get in while the AI bots don’t, the firewall is checking bot identity by user agent, and the fix is a rule change. What no external tool can see is how your firewall treats the crawlers’ actual IP ranges, since it isn’t fetching from OpenAI’s or Anthropic’s network. Treat this as a fast, directional read from outside your network, and confirm anything critical in your server logs.

What to do if crawlers are blocked

  • robots.txt Disallow: remove or narrow the Disallow rule for the crawler you want to allow. Decide deliberately: blocking training crawlers (GPTBot, ClaudeBot) is a content choice, while blocking the search and live fetchers (OAI-SearchBot, ChatGPT-User, PerplexityBot) is what removes you from AI answers.
  • Firewall / WAF user-agent rule: find the ModSecurity, Cloudflare, or security-plugin rule matching the crawler’s user-agent string and allow it.
  • Speed / 499s: cache aggressively at the edge so the page returns in under 500ms for a cold bot request.

Passed the access check? Make sure agents can also discover what your site offers with the Agentic Resource Discovery Checker, or browse the rest of the free SEO tools. Building with AI agents? These checks are also available programmatically over the MCP server.