Comment

Desi Lydic Hosts the Daily Show: Fox News Axes Tucker Carlson & Elon Musk Has a Blue Check Fiasco

31
Randall Gross4/25/2023 11:54:39 am PDT

Interesting post regarding AI scrapers

Img2dataset will attempt to scrape images from any site unless site owners add https headers like “X-Robots-Tag: noai,” and “X-Robots-Tag: noindex.” That means that the onus is on site owners, many of whom probably don’t even know img2dataset exists, to opt out of img2dataset rather than opt in.

On Sunday, Terence Eden posted a comment on the Github page, saying that the tool “hammered” several of his sites and requesting that it be made opt-in.

“I don’t understand why the onus is on me to add a new header to my sites opting out of this tool,” Eden said. “Please can you change the default behaviour so that it will only work on sites which set the X-Robots-Tag: YesAI?”

“If you don’t wish for people to view images from your website, the best way is to turn it off,” Beaumont replied. Beaumont did not respond to a request for comment.

When Eden and other Github commenters pushed back, Beaumont said it would be “unethical” to make img2dataset opt-in rather than opt-out.