Machine Learning Times
Machine Learning Times
EXCLUSIVE HIGHLIGHTS
The Great AI Myth: These 3 Misconceptions Fuel It
 Originally published in Forbes, July 29, 2024. The hottest thing...
How to Sell a Machine Learning Project
 Originally published in Built In, February 6, 2024. Never...
The 3 Things You Need To Know About Predictive AI
 Originally published in Forbes, June 29, 2024. Some problems are...
Alphabet Uses AI To Rush First Responders To Disasters—Takeaways For Businesses
 Originally published in Forbes, July 7, 2024. The National Guard...
SHARE THIS:

4 months ago
Model Collapse: An Experiment – What happens when AI is trained on its own output?

 

Originally published in O’Reilly, October 24, 2023.

Ever since the current craze for AI-generated everything took hold, I’ve wondered: what will happen when the world is so full of AI-generated stuff (text, software, pictures, music) that our training sets for AI are dominated by content created by AI. We already see hints of that on GitHub: in February 2023, GitHub said that 46% of all the code checked in was written by Copilot. That’s good for the business, but what does that mean for future generations of Copilot? At some point in the near future, new models will be trained on code that they have written. The same is true for every other generative AI application: DALL-E 4 will be trained on data that includes images generated by DALL-E 3, Stable Diffusion, Midjourney, and others; GPT-5 will be trained on a set of texts that includes text generated by GPT-4; and so on. This is unavoidable. What does this mean for the quality of the output they generate? Will that quality improve or will it suffer?

I’m not the only person wondering about this. At least one research group has experimented with training a generative model on content generated by generative AI, and has found that the output, over successive generations, was more tightly constrained, and less likely to be original or unique. Generative AI output became more like itself over time, with less variation. They reported their results in “The Curse of Recursion,” a paper that’s well worth reading. (Andrew Ng’s newsletter has an excellent summary of this result.)

To continue reading this article, click here.

Comments are closed.