What is CCBot?

Allow: you are fine with open archiving and broad reuse.
Block: you want to limit downstream training inclusion at the source.

Question

What is CCBot?

Accepted Answer

CCBot is the crawler for Common Crawl, a nonprofit that publishes a free, open web archive. Because that archive is a common source for AI training datasets, blocking CCBot is one upstream way to reduce how often your content ends up in third-party training corpora.

What is CCBot?

Why CCBot matters for AI

Allow or block

robots.txt