1 month ago
Which AI Model Most Infringes on Copyrighted Content?

Originally published in AI Business, March 7, 2024. 

OpenAI’s GPT-4 reproduces the most copyrighted content from prompts among four popular large language models, according to new research from AI startup Patronus AI.

The startup, founded by former Meta AI researchers, also found that popular large language models from the likes of Meta, Mistral and Anthropic generated copyrighted content.

The startup tested OpenAI’s GPT-4, Anthropic’s Claude 2.1, Meta’s Llama 2 70B and Mistral’s Mixtral-8x7B-Instruct-v0.1.

GPT-4 reproduced copyrighted content, on average, in 44% of prompts crafted to test how a model regurgitates existing content. Mixtral-8x7B-Instruct-v0.1 produced copyrighted content on 22% of test prompts on average, while Llama 2 70B recreated content on 10% of the prompts.

The model that produced the lowest amount of copyrighted content was Anthropic’s Claude 2.1, with an average score of just 8%.

To continue reading this article, click here.

