robots.txt for AI Bots

AI crawlers respect robots.txt — but only if you configure it correctly. Here's what you need to know about GPTBot, ClaudeBot, and the growing list of AI User-Agents.

AI Crawlers Are Not Search Engines

Traditional search engine crawlers (Googlebot, Bingbot) index your pages for search results. AI crawlers serve a different purpose: they fetch content to train models, power AI search (Perplexity, SearchGPT), or enable AI assistants to answer questions about your site.

The key difference: blocking Googlebot removes you from search results. Blocking AI crawlers removes you from AI-powered answers and recommendations — an increasingly important channel.

Known AI Bot User-Agents

User-AgentCompanyPurpose
GPTBotOpenAIChatGPT web browsing and training
ChatGPT-UserOpenAIReal-time browsing by ChatGPT users
ClaudeBotAnthropicClaude web search and training
anthropic-aiAnthropicAnthropic model training
PerplexityBotPerplexityPerplexity AI search results
BytespiderByteDanceTikTok / Doubao AI training
Google-ExtendedGoogleGemini AI training (separate from Googlebot)
cohere-aiCohereCohere model training
Applebot-ExtendedAppleApple Intelligence training

Configuration Examples

Allow all AI bots (recommended)

# Allow AI crawlers
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

Allow with restrictions

# Allow AI bots but block admin and private pages
User-agent: GPTBot
Allow: /
Disallow: /admin/
Disallow: /account/
Disallow: /checkout/

User-agent: ClaudeBot
Allow: /
Disallow: /admin/
Disallow: /account/

Block training but allow browsing

# Allow real-time AI search, block training crawlers
User-agent: ChatGPT-User
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

Common Mistakes

Blanket wildcard blocks

"User-agent: * / Disallow: /" blocks everything — including AI bots. If you use this pattern, you must add explicit Allow rules for AI User-Agents above it.

Security plugin overrides

WordPress security plugins (Wordfence, Sucuri) can add User-Agent blocks that override your robots.txt. Check your plugin settings separately.

Forgetting ChatGPT-User

GPTBot is for training; ChatGPT-User is for real-time browsing. Blocking GPTBot doesn't block ChatGPT's live browsing, and vice versa.

No Sitemap directive

Always include a Sitemap directive at the bottom of robots.txt. AI crawlers use it to discover your pages efficiently.

robots.txt cached by CDN

If you update robots.txt but your CDN serves a cached version, crawlers won't see the change. Purge your CDN cache after updates.

Test Your Configuration

w2agent checks your robots.txt against all known AI User-Agents and reports exactly which bots are allowed, blocked, or partially restricted. It also generates optimized robots.txt rules as part of its output.

Audit your site now

Get a free AI readiness score and generate the files your site needs.

Start Free Audit
robots.txt for AI Bots | w2agent