AI Crawlers Are Not Search Engines
Traditional search engine crawlers (Googlebot, Bingbot) index your pages for search results. AI crawlers serve a different purpose: they fetch content to train models, power AI search (Perplexity, SearchGPT), or enable AI assistants to answer questions about your site.
The key difference: blocking Googlebot removes you from search results. Blocking AI crawlers removes you from AI-powered answers and recommendations — an increasingly important channel.
Known AI Bot User-Agents
| User-Agent | Company | Purpose |
|---|---|---|
| GPTBot | OpenAI | ChatGPT web browsing and training |
| ChatGPT-User | OpenAI | Real-time browsing by ChatGPT users |
| ClaudeBot | Anthropic | Claude web search and training |
| anthropic-ai | Anthropic | Anthropic model training |
| PerplexityBot | Perplexity | Perplexity AI search results |
| Bytespider | ByteDance | TikTok / Doubao AI training |
| Google-Extended | Gemini AI training (separate from Googlebot) | |
| cohere-ai | Cohere | Cohere model training |
| Applebot-Extended | Apple | Apple Intelligence training |
Configuration Examples
Allow all AI bots (recommended)
# Allow AI crawlers User-agent: GPTBot Allow: / User-agent: ClaudeBot Allow: / User-agent: PerplexityBot Allow: / User-agent: Google-Extended Allow: /
Allow with restrictions
# Allow AI bots but block admin and private pages User-agent: GPTBot Allow: / Disallow: /admin/ Disallow: /account/ Disallow: /checkout/ User-agent: ClaudeBot Allow: / Disallow: /admin/ Disallow: /account/
Block training but allow browsing
# Allow real-time AI search, block training crawlers User-agent: ChatGPT-User Allow: / User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: /
Common Mistakes
Blanket wildcard blocks
"User-agent: * / Disallow: /" blocks everything — including AI bots. If you use this pattern, you must add explicit Allow rules for AI User-Agents above it.
Security plugin overrides
WordPress security plugins (Wordfence, Sucuri) can add User-Agent blocks that override your robots.txt. Check your plugin settings separately.
Forgetting ChatGPT-User
GPTBot is for training; ChatGPT-User is for real-time browsing. Blocking GPTBot doesn't block ChatGPT's live browsing, and vice versa.
No Sitemap directive
Always include a Sitemap directive at the bottom of robots.txt. AI crawlers use it to discover your pages efficiently.
robots.txt cached by CDN
If you update robots.txt but your CDN serves a cached version, crawlers won't see the change. Purge your CDN cache after updates.
Test Your Configuration
w2agent checks your robots.txt against all known AI User-Agents and reports exactly which bots are allowed, blocked, or partially restricted. It also generates optimized robots.txt rules as part of its output.
Audit your site now
Get a free AI readiness score and generate the files your site needs.
Start Free Audit