Conversations: They reflect one of our most basic human instincts. As infants, learning to talk is one of our first acquired skills, and verbally communicating with others quickly becomes a fundamental part of daily life. But only recently have we started to converse with technology, asking it questions, making requests, and so much more.
Since communicating with one another is an inherent skill, you might think designers would have an easy time crafting conversational UI. But the truth is, there are dozens of intricacies that come with designing for voice, particularly since it’s still early days.
Our interdisciplinary team has gotten its hands dirty exploring the possibilities and pushing the boundaries of designing for voice UI over the past year, including designing and developing a proof-of-concept voice app for Gatorade Gx, and creating a personalized voice assistant for a woman living with multiple sclerosis for the upcoming BBC Two series The Big Life Fix. The learnings that came out of these use cases were numerous, but here are the top takeaways:
Voice should only be one piece of the pie
When you add a voice UI component to your offering, you can shorten and simplify interactions, deliver faster results, provide a personalized service, and make your product more accessible. From a brand perspective, voice UI is a great way to convey brand personality and deepen relationships with customers.
One important lesson, however, is that a voice app is not going to replace other touch points of interaction with a service. A voice-based version of your app most likely won’t serve everyone. Take the Grubhub Skill on Alexa, for example: It only works if a user already has an active account and a previous order history. When using the voice app, customers can’t start an order from scratch; they must repeat an order. So the Skill is only relevant for frequent users who are trying to save time–and that’s okay. Meeting more targeted, personal user needs is exactly why companies invest in creating different digital touch points, and it’s important to remember that your voice app should only be one slice of the pie you’re offering.
There’s no need to reinvent the wheel–it already exists
Designing for voice UI doesn’t mean creating a brand-new design process or guiding principles. Classic frameworks such as Nielsen’s 10 usability heuristics for interface design still apply to voice UI. For example, visibility of system status–the principle of keeping users informed about what a system is doing and why–is an important VUI principle. The only difference is that the feedback may not be at all visual.
Also, similar to other emerging tech such as VR, voice interface design is nothing new. VUI systems were popular in the early 2000s, and there are plenty of learnings still relevant today, like how key it is to use conversational markers (“got it,” “almost there”) judiciously so users don’t get annoyed. As long as we keep pace with recent technological advancements, designers can continue to evolve their approach and be mindful of how these elements might change a user’s expectations.
There are dozens of VUI design guidelines published online including Amazon’s guidelines, best practices, and books including O’Reilly’s Designing Voice User Interfaces. Using and adopting these instead of trying to create your own will save your team time, effort, and resources.
Keep the brand voice consistent
When you’ve spent so much time on the brand voice for your company or product, you of course want to let it shine as much as possible in your new experience. There are couple of ways to consistently apply brand values into the VUI experience: the way your Skills use filler words such as “um” and “got it,” the approach the app takes in handling errors, and the manner in which it welcomes the user. Pay attention to even the smallest nuances; they all reflect your voice.
A voice app can also be personalized by using a prerecorded voice that may already be associated with your offering (Alex Trebek on the Jeopardy Skill, for example). This ensures it’s not just the stock Alexa tone speaking for you, but a language and tone that feels natural to your brand. Not everyone can afford to invest crazy efforts in applying an actual human actor’s voice to their offering, as the AI-enabled personal trainer Vi did when it crafted its product’s personality. But finding a way to embed your brand through voice is key if you want people to start emotionally connecting to the experience.
Another way is to express the brand personality onto what the user has to say. Looking back at our Grubhub example: Its voice app is prompting users to tell Grubhub “I’m hungry.” This is a subtle way the company has injected some of its playful personality into the voice product, unifying the experiences of its mobile and voice apps.
Minimize errors, and keep interactions brief
As a general guideline, a voice interaction should be brief, similar to the way we try to limit the number of clicks required to complete a task on a visual interface while providing the right amount of visual feedback. The same ideals apply to voice UI. However, too much of that feedback and confirmation can make the system sound dumb, while not enough can lead to errors. There are two avenues for providing feedback. We can choose to repeat the request, then go ahead and process it, or require a confirmation from the user first.
For our Gatorade Gx voice app experiment, we found that one way to avoid redundancy was to create an array of confirmation phrases, and randomly select one each time the user makes the request. For example, if a user says “running,” the app may confirm it by saying, “Running sounds like fun,” or “This is a good day for running.”
According to Amazon’s guidelines, confirmations should be used selectively, except when you’re talking about something with significant consequences (like a large transaction). Simply put, to evaluate the need for a confirmation, ask “what can go wrong?” and “what is the outcome?”The more time an error takes to recover, the more value the user will glean from a confirmation.
Provide context, and keep it personal
Context means being aware of the circumstances or setting in which the conversation is taking place, as well as things that have happened in the past. With a lack of visual reference, it is important to provide users with the context of their requests to make sure they are aware of the meaning being applied to their dialogue, in addition to making the interaction feel more personable.
We discovered there are a few simple ways to set up context to a conversation: If the user is logged in to the service already, the system can recall past, already personalized interactions with ease. But even without an account, non-personal variables such as time of day, weather conditions, or even location can help in establishing context.
At any point during the conversation, it is also important to provide users with information on potential actions they can take. This can be achieved by using “help” and “universals.” With the lack of a visual menu, universals are a good reinforcement of what the app can do. On our Gx voice app, for example, we added welcome and help functions to provide an overview of the options to the user, making it easier for them to navigate through the possibilities available to them.
We’re just getting started
Voice apps are still in their infancy, and while we as humans may be natural experts at conversing with one another, we must recognize that this is not yet the case when it comes to engaging in spoken dialogue with bots.
By maintaining an open mind while simultaneously implementing both proven design principles and new, VUI-centric ones, designers can develop cutting-edge guidelines and standards that go beyond today’s digital ecosystem. This approach will help us lay the foundation for the future of screenless experiences, something we’re excited to continue experimenting with.
Efrat Weidberg is an interaction design lead at Smart Design.