Question

What is CCBot?

CCBot is the crawler for Common Crawl, a nonprofit that publishes a free, open web archive. Because that archive is a common source for AI training datasets, blocking CCBot is one upstream way to reduce how often your content ends up in third-party training corpora.

Why CCBot matters for AI

Allow or block

robots.txt

User-agent: CCBot
Disallow: /

Related pages: should I block Bytespider, should I allow GPTBot, AI crawler user agents.

Build your AI robots.txt