ChatGPT Stole Your Work. So What Are You Going to Do?
WiredIf you’ve ever uploaded photos or art, written a review, “liked” content, answered a question on Reddit, contributed to open source code, or done any number of other activities online, you’ve done free work for tech companies, because downloading all this content from the web is how their AI systems learn about the world. This is true for search engines like Google, social media sites like Instagram, AI research startups like OpenAI, and many other providers of intelligent technologies. We are AI researchers, and our research suggests the public has a tremendous amount of “data leverage” that can be used to create an AI ecosystem that both generates amazing new technologies and shares the benefits of those technologies fairly with the people who created them. Because of generative AI systems’ reliance on web scraping, website owners could significantly disrupt the training data pipeline if they disallow or limit scraping by configuring their robots.txt file.