Question

Do AI crawlers respect robots.txt?

The major AI crawlers — OpenAI (GPTBot, OAI-SearchBot), Google (Google-Extended), Anthropic (ClaudeBot), Perplexity (PerplexityBot), and Common Crawl (CCBot) — document honoring robots.txt. A few crawlers are reported to be less reliable, so verify with logs and enforce at the edge if needed.

Trust but verify

  1. Set the robots.txt rule for the token.
  2. Wait, then check your access log for that user-agent.
  3. If it keeps hitting disallowed paths, block it at the firewall/WAF by user-agent and verified IP.

Verification command

# Did a bot hit paths you disallowed?
grep -i 'SomeBot' /var/log/your-access.log | awk '{print $7}' | sort | uniq -c | sort -nr | head

Reality

robots.txt is a directive, not an enforcement mechanism. Well-behaved crawlers obey it; for the rest, edge rules are the backstop.

Related pages: verify AI crawler authenticity, how to block AI crawlers, AI crawler user agents.

Check your crawler logs