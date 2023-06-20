BY SIOBHAN HANNA and SIOBHAN HANNA FOR TELUS INTERNATIONAL6 minute read

The excitement surrounding generative artificial intelligence (GenAI) is well-deserved as it’s given way to new and seemingly limitless possibilities and potential. Amid the hype, companies are having to make important decisions about how they will collect, create, and annotate the data they will use to train, test, and validate their AI algorithms, including large language models (LLMs), to produce the best outcomes.

Although we don’t have a crystal ball that can predict the myriad of ways GenAI will change our lives or how it may eventually be governed, decision-makers in the field need to proactively implement ethical AI practices from the start of their experimentation. This includes responsibly sourcing and using diverse and trusted datasets, because for better or worse, the outputs of AI algorithms will reflect and amplify the choices they make in this regard. Not only is a responsible approach to GenAI the right thing to do, but consumers are paying close attention. Custom research conducted by my firm, which surveyed 1,000 Americans familiar with GenAI, showed that 40% of respondents do not believe companies who are using GenAI technology in their platforms are doing enough to protect users from bias and false information, and three-quarters (77%) believe that before brands integrate generative AI into their platforms, they should be required to audit their algorithms to mitigate bias and prejudice. THE CONSEQUENCES OF NOT GETTING GENERATIVE AI RIGHT Using homogenous data to train algorithms can create unintended biases and have far-reaching societal implications and consequences—for example, chatbots that recreate racial profiling when asked to determine which air travelers present a security risk. Because the LLM was trained on biased data, it outlined code for calculating an individual’s ‘risk score’ based on their ethnicity and if the traveler had previously visited countries labeled as dangerous.

Just as humans are biased, a language-generating machine harbors the countless biases found in the billions of texts it used to train its simulated grasp of language and thought. Another common consequence of ‘bad data’ in GenAI is the emergence of “hallucinations,” a phenomenon in which an algorithm confidently generates content or a response that is not justified by its training data—either because the dataset is insufficient, biased, or too specialized. For example, a hallucinating chatbot with no training data regarding a company’s revenue might generate a random number that the algorithm ranks with high confidence, and then go on to falsely and repeatedly represent that incorrect revenue number with no context provided that the number is a by-product of a weak algorithm. Moreover, as Yann LeCun recently said in an article for the Institute of Electrical and Electronics Engineers, language is rooted in the unspoken: non-linguistic context, common sense, nuance, and tone that GenAI doesn’t yet grasp. “LLMs have no idea of the underlying reality that language describes,” he says. “Those systems generate text that sounds fine, grammatically, semantically, but they don’t really have some sort of objective other than just satisfying statistical consistency with the prompt.”

No one should mistake the imitation of human intelligence for the real thing or assume any text produced by LLMs is objective or authoritative. Like humans, a generative AI is what it eats, and these hallucinations and inaccuracies highlight the critical need for the right quantity of high-quality and diverse data in order to most effectively mitigate bias and misinformation. GENERATIVE AI-POWERED, HUMAN-LED The same custom survey also showed that nearly half (49%) of respondents stated that a GenAI algorithm cannot operate successfully without human input, which highlights another important consideration for companies if they are to gain consumer trust. This could include human involvement and oversight in the development of AI policies and establishment of effective and compliant content moderation rules, in the collection and annotation of data, and to provide continuous model validation, fine-tuning, and alignment efforts across markets, languages, and domains. POLICY AND GUARDRAIL DEVELOPMENT Policy development plays a critical role in ensuring responsible and ethical use of GenAI, including robust data governance practices that cover data privacy and protection. Companies should also be transparent about what data is being used and how it’s been collected and labeled. Given the complexity of GenAI models, companies should also disclose their systems’ limitations, potential biases, and associated risks. These policies should also outline the responsibilities and liabilities of both developers and users of GenAI systems.



Establishing effective and compliant content moderation rules, known as guardrails, will set boundaries to keep GenAI systems in check. From topical and safety guardrails to security guardrails, this critical software and code acts as a set of ethical principles and technical safeguards to control and influence the outputs of GenAI.

MODEL VALIDATION AND TUNING Organizations should also consider implementing model validation to ensure their outputs meet quality standards and brand guidelines. Experts should be brought in to assess relevance, coherence, fluency, and cultural norms, weeding out the ability of an algorithm to produce inaccurate and toxic content. Model tuning improves performance through reinforcement learning from human feedback (RLHF). Experts provide feedback on generated content, which fine-tunes the models. RLHF aligns outputs with brand preferences and is especially invaluable for companies operating globally with customers spread across markets, languages, and regions. It can help better align those companies’ models by taking into consideration cultural nuances and local dialects to further ensure high-quality content. PRIORITIZE DIVERSITY As I shared in a previous article, achieving zero bias in AI is not possible, but by having diverse representation across the roles on a project team, including data scientists, data annotators, and data engineers, companies will be better positioned to mitigate bias in their GenAI initiatives. When it comes to LLMs especially, the datasets used must be annotated by domain experts and content creators with different backgrounds, experiences, ethnic compositions, levels of education, and beliefs, among many other factors, in order to accurately respond to equally diverse prompts from users.

Only a diverse team of individuals is able to capture the nuances and complexities of real-world data and language and be up to the monumental task of validating its accuracy. At TELUS International, we have a global AI Community of more than one million data annotators, linguists, and raters to deliver high-quality, diverse datasets for our clients. Our community members are an extension of our clients’ teams, and their contributions are a critical component driving the ultimate success of their GenAI products and services. A RESPONSIBLE APPROACH TO GENERATIVE AI While GenAI models are already showing great promise, companies must consider the ethical, legal, and commercial risks posed by this emerging technology and consider how they can minimize its risks while maximizing its rewards. By keeping humans in the loop and making investments in procuring the right quantity of high-quality and diverse data, developing sound policies and guardrails, implementing model validation and tuning, and prioritizing diversity, we’re taking steps to create a more responsible, representative, and sustainable tomorrow. Siobhan Hanna is the VP and managing director of TELUS International’s global AI Data Solutions division. TELUS International partners with disruptive brands to power all aspects of their generative AI initiatives. To learn how our best-in-class generative AI solutions can help you create accurate and high-quality models across all domains, click here.