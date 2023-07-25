The explosion in usage of tools like ChatGPT—offering the promise of increased productivity and creativity—is pushing across both personal and professional boundaries. In fact, analyst firm Gartner predicts that by 2025, 70% of new internally developed applications will incorporate AI or machine-learning-based models. But not everything that glitters is gold.

AI, including large language models (LLM), have a variety of challenges that need to be considered before implementing them or allowing their widespread usage within your organization. Data privacy, IP loss, bias, security, and a host of other issues lie in wait for unsuspecting organizations pushed by their boards and C-suites not to miss the AI boat. More recently, and because of LLMs, a new concern has been introduced. Researchers warn these unsuspecting AI victims of a very real and serious danger—model collapse. In the simplest terms, model collapse happens when AI models are trained on AI-generated data—an inevitable concern with the amount of AI-generated content being produced through the LLMs.

Think about photocopying: The first copy is pretty good; details of the picture are mostly there. But what happens when you make a copy of a copy of a copy, etc.? You get a blurry mess that’s unrecognizable from the original. That’s model collapse. The AI model loses all meaning, giving greater importance to higher-probability outcomes and decreasing or even eliminating improbable events, nullifying the model’s original data distribution and intent. Your model is now useless. Implement AI governance to stop problems before they start AI models are data products—they serve to deliver value, drive productivity, and give you a competitive edge. If they don’t, why invest in them? They need to be nurtured, validated, and continuously updated. They need to be governed. AI governance should be the cornerstone of your organization’s AI strategy, and because AI is driven by data, it should fall under your larger data governance program. Companies might follow a simple but effective AI governance framework.

First, define your use case. How will the model be used? What data will drive the model? Is it human-generated or generated by AI? This is a critical step to define your outcomes, assess the risk, and, most important, assign ownership and accountability. Second, identify and fully understand the data being used in the model. The old adage “garbage in, garbage out” is particularly true with AI. Being able to verify and trust the data being used up front sets you up for success and reduces the chances for challenges like model collapse. Third, test, test, and then test some more. Document the results of the model so you can fully understand the output. Can you detect bias? Are you getting logical results? Collecting this information for reporting and analysis will not only help you achieve the outcomes you defined in step one but also ensures you have the ability to trace and report to regulators if necessary.

Finally, continuously verify and monitor your model. AI governance isn’t a one-time project. It should be a routine process, with results scrutinized and updated data fed into the model for retraining and improvement. We’ve found that by following this simple framework, we (and companies we work with that have adopted this) are mitigating the risks associated with AI and, more important, are seeing faster and increased value with the AI projects we’re taking on. All roads lead back to data I think there’s a clear winner in the data-centric versus model-centric debate. You can’t have an effective model without effective data. Responsible AI should be about more than ensuring privacy, safety, and fairness—it should include the explicit guarantee that the data in the model is of the highest quality and won’t fall victim to challenges like model collapse. No exceptions.

Right now, human-created data sources are, generally speaking, the norm. In the future—and that future is fast approaching—AI-generated data may take over. If data is the new oil, then man-made data will be the diamonds that everyone seeks out: more valuable than any other commodity we have. Implementing AI-governance now, and understanding the data you have, will be the difference between success and failure. Felix Van de Maele is the CEO of Collibra.