Why does OAI-SearchBot get 403 when robots.txt allows it?

The block is usually at Cloudflare edge rules such as WAF or bot challenges, not in robots.txt. Check edge event logs and origin logs at matching timestamps.

Question

Why OAI-SearchBot gets 403 behind Cloudflare

Updated June 7, 2026

If robots.txt says allow but logs show 403, the block is usually at the edge layer (WAF, managed challenge, or bot security rule), not in robots policy.

What to check first

Cloudflare WAF events for requests with User-Agent: OAI-SearchBot.
Any custom firewall rule matching ASN, country, path, or request score.
Bot management/challenge settings that may challenge non-browser crawlers.
Origin logs at the same timestamp to confirm whether request reached origin.

Field workflow

# Confirm crawler request lines and status in origin logs
jq -r 'select(((.request.headers."User-Agent"[0] // "") | test("OAI-SearchBot"; "i"))) | [.ts, .request.uri, .status] | @tsv' /var/lib/caddy/logs/llmsfile-access.log | tail -n 80

# Check policy files are reachable
curl -I https://yourdomain.com/robots.txt
curl -I https://yourdomain.com/sitemap.xml

Rule-order checklist

Confirm broad wildcard blocks are not evaluated before crawler-specific allow rules.
Restrict allow rules to public crawl paths (/robots.txt, sitemap, docs/questions paths).
Keep admin, login, and private API routes excluded from crawler exceptions.

Expected recovery signal

After a correct edge-rule fix, you should see 403 drop first on /robots.txt and sitemap paths, then on high-value content pages over the next crawl cycles.

Decision rule that works in production

If you see repeated 403 for crawler paths like /robots.txt and /sitemap.xml, fix edge rules first. Changing llms.txt will not resolve that block.

For a safer allow-rule pattern, continue with how to allow OAI-SearchBot in Cloudflare WAF.

If Bot Fight Mode is enabled, check this page next: Does Cloudflare Bot Fight Mode block AI crawlers?.

Build crawler-check commands