6 months, 1 week ago

The Race to Block OpenAI’s Scraping Bots Is Slowing Down

It’s too soon to say how the spate of deals between AI companies and publishers will shake out. OpenAI has already scored one clear win, though: Its web crawlers aren’t getting blocked by top news outlets at the rate they once were. When Apple debuted a new AI agent this summer, for example, a slew of top news outlets swiftly opted out of Apple’s web scraping using the Robots Exclusion Protocol, or robots.txt, the file that allows webmasters to control bots. The number of high-ranking media websites using robots.txt to “disallow” OpenAI’s GPTBot dramatically increased from its August 2023 launch until that fall, then steadily rose from November 2023 to April 2024, according to an analysis of 1,000 popular news outlets by Ontario-based AI detection startup Originality AI. When a WIRED investigation earlier this summer found that the AI startup Perplexity was likely choosing to ignore robots.txt commands, Amazon’s cloud division launched an investigation into whether Perplexity had violated its rules.

Discover Related