Amazon and Malicious Compliance

What do you do when you want to crawl websites, but some people have explicitly said that they want your crawler to fuck right off?

Why, just change the user agent of your crawler! Technically you’re complying with robots.txt!

robots.txt:
User-agent: Amazonbot Disallow: /

access.log:
54.158.133.188 - - [18/Jan/2026:17:50:52 +1100] "GET /some/trash/ HTTP/1.1" 302 521 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amzn-SearchBot/0.1) Chrome/119.0.6045.214 Safari/537.36"

I’d suggest not doing that again. It would be unfortunate if I had to go banning entire IP ranges.

Shaking my fist at the æther

AntiSols Blog

Leave a Reply Cancel reply