Twenty-two years ago, my son Nolan was born with a congenital birth defect called arthrogryposis multiplex congenita. It restricts his ability to move his hands, wrists, elbows, and shoulders, making it difficult for him to do a wide range of things most of us take for granted.
When you are gifted a child with a disability, your parental instincts immediately kick in. You ask what your child will need to have a career, be independent, and be happy and successful in life. When he was about a year old, an old family friend told me, “Adam, technology has advanced so much, Nolan will be okay. Fifty years ago, it would have been much harder–now there are things like Velcro.”
While this was reassuring in some ways, it still left the question of life and a career wide open. Being a software developer, my initial instinct was to work on ways that I could teach Nolan to code. It’s a career that requires a sharp mind, which Nolan has, and not a lot of physical movement. Also, if he didn’t want to become a coder, a computer would still likely be his best way to find a good job.
However, most of us use keyboards to interact with a computer, and Nolan can’t type efficiently. So we needed to find a way he could quickly convert speech to text, something that has continuously fueled my interest in the technology–as well as my own personal research to improve it.
Back then, the best software available was Dragon Naturally Speaking, a speech recognition program that translated voice to text for Microsoft Windows machines. While the tool was very helpful, it was still underpowered and limited in mobility.
Thankfully, times have changed, and things have come a long way. Voice recognition has become mainstream in the past few years, and the quality of the interaction has progressed by leaps and bounds. Anyone with a smartphone today has access to translation-to-text and even functional interaction. In addition, devices such as Echo and Google Home provide easy voice interactivity in a home environment. While these are not turnkey solutions for software developers, they are impressive.
A number of brands–such as Zappos, Target, and L’Occitane–have started exploring ways to make their products and services more accessible. My recent work in the field has focused on voice commerce with the Echo Show, a device that features a screen along with Amazon’s Alexa voice assistant. The project is a part of a pilot program with Tommy Hilfiger’s Tommy Adaptive brand, which creates inclusive designs that are meant to make getting dressed easier. The problem, of course, is that we also need to make shopping for this clothing easy, too. Due to their disabilities, many people cannot interact with traditional e-commerce interfaces, and we’ve been exploring ways to address that.
To understand the challenge, you have to know a little about how commerce systems are built. There are two major types today, which we can call “headless” and “traditional.” Traditional e-commerce platforms tend to have been built 10 to 15 years ago. Their interfaces are tied quite tightly to the middleware layer that accesses the core commerce functionality. In layman’s terms, this creates a rigid system that makes it difficult to add a non-traditional interface, such as accessible voice.
The second is an API-led or headless approach. Here, the user experience is decoupled from the core commerce functionality, allowing you to plug in an experience layer on top. Again, in layman’s terms, this does not make it easy to create voice interaction, but it does make it less complicated, lower risk, and less expensive. Traditional platforms sometimes limit companies from investing in accessible solutions because their starter frameworks have not kept up with consumer experience needs. Headless ones are much more amenable since they are typically less cumbersome to evolve in isolation.
Luckily, the world is going headless. At a recent IBM Watson Commerce webinar, for example, the company showed that it was rapidly moving toward the new API-first economy, with technical design patterns that focus more on the customer experience layer. It’s encouraging to see how this decoupled approach combines with improved voice recognition to empower those who have trouble with traditional interfaces.
Like all work in this field, however, my team’s is still nascent. We’re only just beginning to learn how to use new voice capabilities to increase accessibility. That said, we do have some interesting learnings:
Have a backup plan for voice
Natural language processing, while gaining in sophistication, is far from perfect. It’s best to have two options for interacting via voice. For example, you should always assign numbers to actions as a backup in case the device cannot recognize responses or the user’s ability to articulate is a limiting factor.
Leverage as many modes of communication as you can
For example, on the Echo Show, we can show options visually. We also use voice response and add text prompts to the screen to be able to address the widest possible audience.
Remember that NLP interfaces are not turnkey
Users need to set up technology, and that can be a barrier. Many systems require something other than voice activation to get up and running. Additionally, the user may need to provide information about the existing Wi-Fi environment in the physical location where the tools will be used. Solving these problems is a major challenge for NLP developers.
Nonetheless, these new capabilities make me incredibly optimistic for people like Nolan. Natural language processing is not an answer in itself nor a plug-in solution. People like me have to bring it from a theoretically interesting technology to one that is working to its full potential. That is not going to be easy, but thanks to the interest of different brands, we’re already making progress.
Adam Wolf is chief technology officer, Americas, at Possible.