The Race to Block OpenAI’s Scraping Bots Is Slowing Down
It’s too soon to say how the spate of deals between AI companies and publishers will shake out. OpenAI has already scored one clear win, though: Its web crawlers aren’t getting blocked by top news outlets at the rate they once were. When Apple debuted a new AI agent this summer, for example, a slew of top news outlets swiftly opted out of Apple’s web scraping using the Robots Exclusion Protocol, or robots.txt, the file that allows webmasters to control bots. The number of high-ranking media websites using robots.txt to “disallow” OpenAI’s GPTBot dramatically increased from its August 2023 launch until that fall, then steadily rose from November 2023 to April 2024, according to an analysis of 1,000 popular news outlets by Ontario-based AI detection startup Originality AI. When a WIRED investigation earlier this summer found that the AI startup Perplexity was likely choosing to ignore robots.txt commands, Amazon’s cloud division launched an investigation into whether Perplexity had violated its rules.
Discover Related

Microsoft adds OpenAI to its list of competitors in AI and search

OpenAI is launching search engine, taking direct aim at Google

SearchGPT: OpenAI Enters Google Territory With Its Latest AI Search Platform

OpenAI is building SearchGPT search engine, as new AI battlefronts emerge

OpenAI enters Google-dominated search market with SearchGPT

OpenAI tests ChatGPT-powered search engine that could compete with Google

The AI race: OpenAI and Google are showcasing advanced AI models

OpenAI to launch AI-powered search engine, challenging Google’s dominance

OpenAI is trying to poach as many Google employees as possible. Here's why

OpenAI restricted by top news publications from content access

News outlets including NYT, ABC and CNN block OpenAI’s web crawlers: Report
