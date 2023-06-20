BY SIOBHAN HANNA FOR TELUS INTERNATIONAL and TOBIAS DENGEL7 minute read

User experience (UX) remains a primary competitive battleground, where design-forward firms consistently outpace laggards in revenue and shareholder return. Given the recent growth of voice user interfaces (such as smart assistants) and the current industry focus on generative AI (GenAI), businesses must understand how to effectively utilize these technologies to reap a full range of rewards. These benefits include faster support time and enhanced customer loyalty, among many others.

Yes, the technology is powerful, and the pace of change is thrilling. But to understand where, when, and how to deploy GenAI-enabled voice technology, brands must consider the following: Which company-wide processes or department-specific tasks will benefit from GenAI implementations?

Are your algorithms currently being trained via high-quality, diverse datasets?

Have you engaged the right engineering, design, and delivery partner to discuss your business’ goals and develop a long-term strategy?

Does your company culture encourage your employees to embrace change and use AI/ML tools to support and supercharge human- and tech-enabled resources?

How will you optimize the user interface? (Note that UI remains a weak point for large language models [LLMs] and similar tools.) Exploration and early adoption will be critical in distinguishing the winners in today’s highly competitive AI implementation battles. However, the primary differentiator for companies to succeed in this crowded space is not simply generative AI alone. Instead, it’s GenAI paired with voice technology. It’s no accident that ChatGPT’s recent iOS app launch highlighted voice input as its key feature. GENERATIVE AI + VOICE TECHNOLOGY = A WINNING COMBINATION Voice is among our most familiar forms of communication. Many begin learning and practicing speech as infants. And yet, for the last century, we’ve used our hands to communicate with machines and devices by typing, tapping, and clicking.

After so many years, why are consumers around the world now driving such increased demand to return to voice-first communication? By 2024, the number of voice-enabled devices worldwide will equal the global human population. By 2030, the global voice assistant market will reach over $14 billion (up from $1.5 billion in 2020). The chatbot market is poised to see similar gains, from this year’s valuation of $5 billion to $15 billion by 2028—with the voice bots segment projected to experience an even higher compound annual growth rate (CAGR). To gain traction, any technology must solve one or more core human needs. Unlike recent misses like 3D TV, the “voice-first” experience satisfies not just one or two, but five needs, offering an almost unequaled breadth of influence for a new technology: Speed : Increasing the efficiency of every human-machine interaction, because we type three times as fast as we speak

: Increasing the efficiency of every human-machine interaction, because we type three times as fast as we speak Safety : Creating a less dangerous world, where machines from cars to jetliners more effectively respond to our inputs

: Creating a less dangerous world, where machines from cars to jetliners more effectively respond to our inputs Knowledge : Acquiring critical information when and where you need it by enabling rapid long-tail, multi-word searches

: Acquiring critical information when and where you need it by enabling rapid long-tail, multi-word searches Engagement : Making life more entertaining and enjoyable through access to the metaverse and deeply immersive experiences

: Making life more entertaining and enjoyable through access to the metaverse and deeply immersive experiences Transformation: Generating voice-enabled business models that change industry landscapes, often in “heads-up, hands-free” environments, ranging from medicine and law enforcement to retail, factories, and distribution centers Recent leaps in GenAI like GPT, LLMs, natural language processing (NLP), and automatic speech recognition (ASR) are also contributing to voice technology’s growing popularity and use. The underlying datasets and algorithms powering machine learning technologies are constantly evolving and improving, especially as more users interact with them, further training the models. This exponential evolution will result in ever-faster breakthroughs.

Through increased accuracy, efficiency, and pattern recognition, such AI tools will enable the next generation of voice-enabled search, navigation, data analysis, translation services, chatbots, virtual assistants, and more. OVERCOMING USER EXPERIENCE OBSTACLES As voice technology evolves, new platforms, devices, industries, and integrations are simultaneously emerging. Unfortunately, many retain a critical UX design flaw: they offer voice-only conversations separate from screens and other digital experiences and remain a standalone, self-contained call-and-response system. Apple’s virtual assistant Siri is one example of this. But Siri co-founder and voice tech pioneer Adam Cheyer always envisioned Siri as a “multimodal” experience—one that provides text, graphic, voice, and haptic responses delivered concurrently with user input.

“The perfect interface will be when you can mix direct manipulation with conversational interaction so seamlessly that you don’t even think about it,” Cheyer said. “After all, that’s what humans do in other contexts every day.” The core idea behind Cheyer’s argument is simple. Humans speak three times faster than we type and we read twice as quickly as we listen. We don’t want back-and-forth voice dialogue with our devices. Speaking a command and reading an immediate response is far more efficient. Many voice tools now employ these multimodal interfaces—using voice alongside screens, keyboards, sounds, hand gestures, eye movements, and haptics to connect humans and machines with increased speed. Similarly, the value of generative AI like LLMs will become even more apparent when they graduate from the current text-only, call-and-response interface to a multimodal experience.

When integrated with the latest generative AI algorithms, multimodal voice technology can alter how we live and how companies do business—drastically increasing speed, convenience, accessibility, and productivity across industries and functions. IDENTIFYING AI + VOICE USE CASES Before the recent proliferation and adoption of LLMs, my firm conducted a nationwide survey to determine which voice use cases resonated most with users. We asked respondents to rate 15 “VoiceCases” on usefulness and efficiency. Based on this research and given the recent AI advancements, multimodal voice is poised to become the preferred interaction model for the following four use cases that currently primarily occur via screens:

Specific Search: Voice input paired with generative AI can drastically increase efficiency when locating an available item or information, including media, directions, weather, FAQs, or inventory management. Composition and Logging: As most of us have already experienced, GPT and other LLMs can significantly facilitate content development and data entry, whether writing emails, completing forms, composing lists, or managing more complex paperwork. Coaching and Instruction: For skills requiring guidance, such as driving a vehicle, piloting an aircraft, performing medical procedures, and other “heads-up, hands-free” tasks, the combination of multimodal voice and GenAI aids speed, accuracy, and safety. Data Analysis: With ever-larger data quantities at our disposal, combining voice and AI allows us to query databases using natural language, translating these prompts into datasets like structured query language (SQL) databases, then responding in plain language. INTEGRATING ENTERPRISE GENERATIVE AI The content required to train your AI can come from many places, including in-person user interviews, customer support transcripts, customer-facing knowledge bases, blogs, and other high-quality data streams. But enterprise-level benefits of this technology won’t happen simply by leveraging the free, public version of LLMs, which can use your potentially sensitive or confidential data to train its model and involves many other risks. GPT is a remarkable tool, but companies must consider an enhanced security posture to accomplish even simple, enterprise-specific tasks, (e.g., allowing a user to change their address using natural language). This may include integrating the tech into an existing CRM, authenticating users, training the model on company-specific data, implementing and testing safeguards, and building customer- and employee-facing interfaces. AI and voice tech only unlock UX wins with a well-considered strategy and experienced implementation team in place. WHAT’S ON THE HORIZON FOR GENERATIVE AI-ENABLED VOICE TECH? While voice may seem like a “UX enhancement” in today’s conventional applications, it will quickly become a requirement in next-generation, AI-enabled software. To be successful, brands must make the interface—the aural, visual, and tactile elements users engage with to participate in their business process—as streamlined, easy, and enjoyable as possible. This means combining the most immediate input (voice) and output (screens) with the natural language capabilities of GenAI.