Blind, leading the blind: Developers are using AI-generated data to train their AI bots
AI developers are training their models and LLMs on AI generated data, mainly to save costs. This concept, known as “synthetic data,” presents a promising opportunity for significant advancements in the AI ecosystem, although it also raises comparisons to an algorithmic ouroboros. Feeding a data-hungry monster According to the Financial Times, OpenAI, Microsoft, and the startup Cohere, valued at two billion dollars, are actively researching synthetic data to train their large language models. AI’s questionable integrity and reliability However, critics point out a significant drawback: AI-generated data’s integrity and reliability might be questionable, as even AI models trained on human-generated data are known to make substantial factual errors. If developers are using faulty data or a data set that has been generated under a hallucination, the resulting AI bot will also generate faulty results.
Discover Related

IIITH focuses on making AI to forget info

In charts: How AI companies' data hunt is sparking copyright wars

AI developers need to understand the science behind 'deep learning': Know-how?
