Question
Do AI crawlers respect robots.txt?
The major AI crawlers — OpenAI (GPTBot, OAI-SearchBot), Google (Google-Extended), Anthropic (ClaudeBot), Perplexity (PerplexityBot), and Common Crawl (CCBot) — document honoring robots.txt. A few crawlers are reported to be less reliable, so verify with logs and enforce at the edge if needed.
Trust but verify
- Set the robots.txt rule for the token.
- Wait, then check your access log for that user-agent.
- If it keeps hitting disallowed paths, block it at the firewall/WAF by user-agent and verified IP.
Verification command
# Did a bot hit paths you disallowed?
grep -i 'SomeBot' /var/log/your-access.log | awk '{print $7}' | sort | uniq -c | sort -nr | head
Reality
robots.txt is a directive, not an enforcement mechanism. Well-behaved crawlers obey it; for the rest, edge rules are the backstop.
Related pages: verify AI crawler authenticity, how to block AI crawlers, AI crawler user agents.