What The New York Times suit against OpenAI could mean for AI The New York Times filed a lawsuit against OpenAI and Microsoft late last month, alleging the companies used its content to train their respective AI models without permission or compensation. Developers of large language models have routinely scraped huge batches of data from the internet, and then allowed their models to train by processing and finding patterns in the data. But the Times’s claims go deeper: the suit says OpenAI and Microsoft encoded the newspaper’s articles into their language models’ memory so that ChatGPT and Bing Chat (now called Copilot) could access and regurgitate the information—in some cases verbatim, and without proper citation (the suit contains numerous examples of this). The lawsuit demands that any chatbot trained using the data be taken offline. The lawsuit came as a surprise to OpenAI: A company spokesperson told Axios that the two sides had been discussing content licensing terms. The suit marked a sobering coda to 2023, a year in which the AI industry sprinted forward unrestrainedly, and mostly without regulation. Many in the tech industry had hoped that 2024 would bring far wider application of AI systems. But lawsuits over copyrights could slow everything down, as legal exposure concerns become a bigger factor in AI companies’ plans for how and when to release new models. Could training data—not safety concerns or job destruction fears—become the AI industry’s Achilles’ heel? The OpenAI lawyers may argue that an AI model isn’t much different from a human who ingests a bunch of information from the web then uses it as a basis for their own thoughts. That whole debate may be moot if the Times can prove that it was financially harmed when OpenAI’s and Microsoft’s AI models spat out line-for-line text lifted from the paper’s coverage. But the main issue is that this is all uncharted legal territory; a high-profile trial may begin to establish how copyright law applies to the training of AI models. Even if OpenAI ends up paying damages, the two parties may still come to an accommodation allowing the AI company to use Times content for training.

News publishers’ posture toward AI companies runs the gamut: The Wall Street Journal, News Corp, and Gannett want to license their stories to AI developers, while others such as Reuters and CNN have begun blocking AI companies from accessing their content. Meanwhile, it’s still not outside the realm of possibility that the courts or the Federal Trade Commission could order AI companies to delete training data they’ve already scraped from the web. (The FTC did, after all, open an inquiry on OpenAI’s training data acquisition practices last summer.) “In the months ahead, we’ll continue to see additional licensing agreements between credible publishers and AI companies,” says Alon Yamin, the cofounder and CEO of Copyleaks, which makes an AI plagiarism detection tool. “And yes, additional lawsuits.” Ready for another buzzy AI smartphone killer? First there was Humane’s Ai Pin, an AI device you can wear on your lapel. Now, another company, L.A.-based Rabbit, is set to reveal its own AI-centered device, called the r1, during next week’s CES trade show. The demo video shows people instructing a device to order an Uber, find a new podcast, and “tell everybody I’m going to be a little late.”