Amazon and Malicious Compliance

What do you do when you want to crawl websites, but some people have explicitly said that they want your crawler to fuck right off?

Why, just change the user agent of your crawler! Technically you’re complying with robots.txt!

robots.txt:
User-agent: Amazonbot
Disallow: /

access.log:
54.158.133.188 - - [18/Jan/2026:17:50:52 +1100] "GET /some/trash/ HTTP/1.1" 302 521 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amzn-SearchBot/0.1) Chrome/119.0.6045.214 Safari/537.36"

I’d suggest not doing that again. It would be unfortunate if I had to go banning entire IP ranges.

Leave a Reply