Question
Why OAI-SearchBot gets 403 behind Cloudflare
If robots.txt says allow but logs show 403, the block is usually at the edge layer (WAF, managed challenge, or bot security rule), not in robots policy.
What to check first
- Cloudflare WAF events for requests with
User-Agent: OAI-SearchBot. - Any custom firewall rule matching ASN, country, path, or request score.
- Bot management/challenge settings that may challenge non-browser crawlers.
- Origin logs at the same timestamp to confirm whether request reached origin.
Field workflow
# Confirm crawler request lines and status in origin logs
jq -r 'select(((.request.headers."User-Agent"[0] // "") | test("OAI-SearchBot"; "i"))) | [.ts, .request.uri, .status] | @tsv' /var/lib/caddy/logs/llmsfile-access.log | tail -n 80
# Check policy files are reachable
curl -I https://yourdomain.com/robots.txt
curl -I https://yourdomain.com/sitemap.xml
Decision rule that works in production
If you see repeated 403 for crawler paths like /robots.txt and /sitemap.xml, fix edge rules first. Changing llms.txt will not resolve that block.
If Bot Fight Mode is enabled, check this page next: Does Cloudflare Bot Fight Mode block AI crawlers?.