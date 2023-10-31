Makers of generative artificial intelligence tools like ChatGPT have been using copious amounts of copyrighted news material to train their chatbots, according to new accusations from a new trade group.

The News/Media Alliance, which represents over 2,200 publishers, showcased its research in a blog post and white paper Tuesday, saying AI companies regularly used the information in news stories without authorization, and violate laws protecting that intellectual property.

“The research and analysis we’ve conducted shows that AI companies and developers are not only engaging in unauthorized copying of our members’ content to train their products, but they are using it pervasively and to a greater extent than other sources,” said Danielle Coffey, Alliance president and CEO, in the release. “This diminishment of high-quality, human created content harms not only publishers but the sustainability of AI models themselves and the availability of reliable, trustworthy information.”

The group’s research claims the datasets used to train large language models —a building block of generative AI that’s essential to tools like ChatGPT, Bard, and more—of major chatbots “significantly” overweighted content from news, magazines, and digital media sources, using it 5 to almost 100 times as frequently as other content.