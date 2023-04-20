The models behind generative AI tools like ChatGPT, Midjourney, and Stability AI that have wowed the world all share an open secret: They’re trained on vast volumes of data scraped from the internet.
Although some AI companies, including OpenAI and Israeli AI company Bria, have paid for access to the training data that makes their models work so well, others continue to rely on unfettered, free access to the world’s textual and image-led output.
But the parameters around that access are now changing. This week, Reddit cofounder and CEO Steve Huffman told The New York Times that the popular internet forum would begin charging companies that cull its data for AI training purposes. “The Reddit corpus of data is really valuable,” Huffman said. “But we don’t need to give all of that value to some of the largest companies in the world for free.”
The ramifications of that decision could have significant knock-on effects on the way that AI is trained. Reddit is the so-called front page of the internet; it’s where the world’s conversations take place. That’s rich pickings for companies developing large language models (LLMs).
Reddit has recognized its value. Per Huffman, Reddit will impose a paywall around its application programming interface (API), the method through which companies developing AI models are able to download data from the social platform. The level of pricing, and when it would happen, has yet to be determined, the executive said—though carve-outs would apparently remain for academic researchers to freely access the site’s content.
It marks a shift in approach that could change how AIs understand our world and, as AIs become more commonplace, how we humans do, too.
“The time of the free API may be over,” says Andres Guadamuz, an intellectual property law researcher at the University of Sussex. “The move makes sense for companies such as Reddit. In the absence of licensing agreements for training, API access is the next best thing to try to recover some money.” (Reddit didn’t respond to Fast Company’s request for comment.)