Amazonbot is finally respecting robots.txt
xena
150 points
36 comments
May 14, 2026
Related Discussions
Found 5 related stories in 91.7ms across 8,303 title embeddings via pgvector HNSW
- Amazon wins court order to block Perplexity's AI shopping agent SilverElfin · 24 pts · March 10, 2026 · 52% similar
- Amazon AI Cancelling Webcomics vmbrasseur · 59 pts · April 16, 2026 · 51% similar
- After outages, Amazon to make senior engineers sign off on AI-assisted changes ndr42 · 518 pts · March 10, 2026 · 48% similar
- Amazon employees are "tokenmaxxing" due to pressure to use AI tools Bender · 216 pts · May 12, 2026 · 48% similar
- Amazon finds out AI programming isn't all it's cracked up to be CrankyBear · 22 pts · March 16, 2026 · 48% similar
Discussion Highlights (11 comments)
bstsb
> Get Outlook for Mac this bit made me laugh. was the email drafted in Outlook? was it sent to some sort of forwarding mailbox, or did they just BCC every customer in?
jacobn
I just complained to them the other day! They were scraping our weather website to no end, very much including the disallowed path prefixes. Did end up just adding them to our WAF blocklist, which is weirdly ironic - hosting on their infra & using their services to block their AI scraper...
namegulf
Robots.txt is lame BTW, there is no way to enforce it. It is up to the bot to decide to crawl or not and most cases they don't care. Cloudflare had a nice technic to address the bot problem (if you use their name servers). It'll respect and use the robots.txt while sending the remaining bots to a deep black hole.
arjie
Huh, I get a lot of traffic from Amazonbot (relative to humans) and try as I might, it would get stuck in a tarpit of no creation because it would sit there and keep blasting every variation of my recent pages because Mediawiki lists many links. I have them appropriately nofollow and warning the bot not to waste its time with robots.txt but it just goes and sticks itself on nonsense internal pages. The traffic isn't a problem. I've got Cloudflare in front and the machine itself is relatively overpowered, and downtime isn't critical. But I'd just like the thing to be able to spider me properly. Someone did point out to me that maybe I wasn't receiving actual Amazonbot but some other spider: https://news.ycombinator.com/item?id=46352723
TurdF3rguson
Why does Amazonbot even exist, can someone explain? I don't understand why an ecommerce play would be crawling other websites.
captn3m0
Good place to ask, saw a new AWS User agent in logs today: Amazon-Quick-on-Behalf-of-$HEXID I found a mention on some user agent trackers but no official documentation. Anyone knows if it’s documented? Asking because I am seeing decent traffic (30GB/week) from this.
phdelightful
I just put Anubis in front of my self-hosted forge this morning because AmazonBot had helped itself to 750 GiB (!) of traffic to my public repos this month! At least, it claimed to be AmazonBot…
vindin
robots.txt is merely a gentleman’s courtesy at this point. Nobody is obligated to follow it.
TrackerFF
Is it just me, or is it extra unethical and self-serving when crawlers from say Amazon(Bot) decides to incessantly crawl AWS hosted websites? Same goes for Google and Microsoft crawlers crawling GC and Azure. By that, I mean the types of crawls that can hog up significant usage.
rho138
If it respected the standard then a lack of a robits.txt implies do not crawl, which they openly state they ignore
faangguyindia
if you run Meta Ads, it's notorious for ddosing your website with bots. Basically, their ad manager sends dozens of click for each variant of ad you post.