Name: w2agent
Author: w2agent

The Silent Problem

Unlike search engines, AI crawlers don't send you a notification when they can't access your site. There's no AI equivalent of Google Search Console showing crawl errors. Your site could be completely invisible to ChatGPT, Claude, and Perplexity — and you'd never know unless you explicitly test for it.

w2agent checks your site against all known AI crawler User-Agents and reports exactly what's blocked, partially accessible, or fully open. Here are the most common causes it finds.

Cause 1: Security Plugins

WordPress security plugins are the #1 cause of blocked AI crawlers. Wordfence, Sucuri, iThemes Security, and All In One WP Security all have features that block "suspicious" User-Agents — and AI bot names often trigger these rules.

Common Wordfence block rule that catches AI bots:

# Wordfence advanced blocking
# This pattern blocks any User-Agent containing "bot"
# — including GPTBot, ClaudeBot, PerplexityBot
Block User-Agents matching: /bot/i

Fix: Add specific exceptions for AI User-Agents in your security plugin's allowlist. In Wordfence, go to Firewall → Blocking → Advanced → and add exceptions for GPTBot, ClaudeBot, and PerplexityBot.

Cause 2: CDN/WAF Rules

Cloudflare, AWS WAF, and other CDN/WAF services have bot management features that can block AI crawlers. Cloudflare's "Bot Fight Mode" and "Super Bot Fight Mode" treat AI crawlers as automated traffic — which they are — and may challenge or block them.

Cloudflare: Security → Bots → Configure Bot Fight Mode. Add AI User-Agents to the "Verified Bots" allowlist.

AWS WAF: Check your rate-limiting rules. AI crawlers may exceed per-IP request limits during content ingestion.

Akamai: Review Bot Manager policies. AI crawlers are typically classified as "unknown bots."

Cause 3: robots.txt Misconfiguration

Many sites have overly restrictive robots.txt files that unintentionally block AI bots. Common patterns:

Blanket disallow

User-agent: *
Disallow: /

Blocks everything — including AI. If you need this, add explicit Allow rules for AI bots above it.

Explicit AI bot blocks

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

Sometimes added by SEO plugins or copied from template robots.txt files without understanding the impact.

Cause 4: Client-Side Rendering

Pages that rely entirely on JavaScript to render content (React SPAs, Angular, Vue without SSR) appear as empty shells to AI crawlers. Most AI crawlers do not execute JavaScript — they fetch the HTML and parse what they get.

Fix: Use server-side rendering (SSR), static site generation (SSG), or at minimum ensure critical content is in the initial HTML response. Next.js, Nuxt, and similar frameworks support this natively.

Cause 5: Authentication Walls

Login-required content, paywalls, and gated content are invisible to AI crawlers. Unlike Googlebot, which some paywall providers allow to crawl via "First Click Free" policies, AI crawlers have no equivalent program.

Fix: If you want AI to know about your gated content, provide metadata and previews in the public-facing HTML. Schema.org's isAccessibleForFree property can signal this.

Cause 6: Rate Limiting

AI crawlers can be aggressive — fetching dozens of pages in quick succession. If your server or CDN rate-limits by IP or User-Agent, the crawler may get blocked partway through.

Fix: Set reasonable rate limits (not <1 req/sec) and consider whitelisting known AI crawler IP ranges. OpenAI and Anthropic publish their crawler IP ranges.

How to Test If You're Blocked

You can simulate AI crawler requests with curl to see what your server actually returns for each User-Agent:

# Test GPTBot access — check HTTP status and response size
curl -s -o /dev/null -w "Status: %{http_code} | Size: %{size_download} bytes\n" \
  -A "GPTBot" https://your-site.com/

# Test ClaudeBot
curl -s -o /dev/null -w "Status: %{http_code} | Size: %{size_download}\n" \
  -A "ClaudeBot" https://your-site.com/

# Compare to a known-good User-Agent (browser)
curl -s -o /dev/null -w "Status: %{http_code} | Size: %{size_download}\n" \
  -A "Mozilla/5.0 (compatible)" https://your-site.com/

# If AI bots get 403/200 with small response but browser gets 200 with large response:
# → WAF/CDN is blocking or serving a CAPTCHA page to bots

A response size much smaller than the browser response often means a CAPTCHA, login wall, or empty shell — not a real block in HTTP terms, but functionally invisible to AI. Check your robots.txt configuration separately, as curl won't simulate robots.txt parsing.

Real-World Impact of Blocking

AI crawler blocking has measurable downstream effects. Perplexity and SearchGPT use crawler data to cite sources and generate answers. Sites that block these crawlers don't appear in AI-generated citations — even when they're the authoritative source on a topic.

For developer tools and documentation sites, this means coding agents like Claude Code and Cursor can't reference your docs. Users asking an AI assistant "how do I use X?" won't get an answer citing your official docs — they'll get a hallucinated approximation instead.

Pairing accessible crawling with llms.txt and structured data creates the complete foundation measured by the w2agent score.

Diagnose Your Site

w2agent tests all of these causes automatically. It sends requests with each AI crawler's User-Agent and reports exactly what's blocked, what's slow, and what's accessible. Run an audit to see your site from an AI crawler's perspective.

robots.txt for AI Bots — Write the correct rules so AI crawlers can reach your site.
What is llms.txt? — What AI crawlers actually read once blocking issues are resolved.
AI Readiness Audit — Run the full audit to diagnose all crawler access issues at once.

Score your site now

Get your free w2agent score and generate the files your site needs.

Get Your Score

AI Crawler Blocking