For Amazon, The Future Of Alexa Is About The End Of The Smartphone Era

With touch-screen devices like the Echo Show and Echo Spot, Amazon is trying to upend a decade of smartphones, notifications, and apps.

For Amazon, The Future Of Alexa Is About The End Of The Smartphone Era
The Echo Spot, which ships in December. [Photo: courtesy of Amazon]

Picture a hypothetical future in which Amazon’s Alexa renders your smartphone unnecessary.


You tap the “good morning” button on your bedside Echo Spot, and it displays the day’s forecast as the blinds open and the bedroom lights fade on. Downstairs, the Echo Show in the kitchen displays a reminder about your doctor’s appointment at noon, and alerts you to a single urgent email: You’ll have to join a conference call as soon as you get into the office.

In the car, you press the “commute” button on the Alexa-enabled dashboard, which queues up your favorite music while steering you around a traffic jam, and after work, Alexa relays another reminder to your car, telling you to grab milk on the way back. At home, you swipe through a list of restaurants on the Echo Show to make a reservation for the weekend, check in on your folks via video chat, and finally sit down to relax. With one more button press, the next episode of your favorite show starts playing on the Fire TV in the living room.

Far-fetched as it seems, this is the kind of scenario Amazon wants to create as its Alexa assistant migrates from smart speakers to touch screens. Instead of making users wade through a sea of apps to get things done, the goal is to create a new kind of computing that’s simpler and less distracting. And while voice will be the foundation, there may be plenty of times where you use Alexa without speaking at all.


“The tenet that we believe in is that voice is a simplifier,” says Miriam Daniel, Amazon’s head of product management for Alexa. “And once you design voice actions, then go back to a touch action, you’ve actually removed a lot of friction because you’ve started at a different baseline.”

If Amazon is successful, Alexa could siphon precious time and attention away from smartphones, in the same way that smartphones have made the PC less relevant over the past decade. But even Amazon acknowledges that it still has a lot of work to do as it tries to invent a new computing paradigm.

Amazon’s AppStore has never rivaled Google Play’s riches.

Peak App

Amazon has plenty of motivation to upend software conventions, having watched from the sidelines as Apple’s iPhone and Google’s Android revolutionized personal computing. Amazon’s sole attempt at a smartphone, 2014’s Fire Phone, was one of the company’s highest-profile failures, and although Amazon still maintains an app store for its Android-based Fire tablets, the catalog is much smaller than the Google Play app store on Google-sanctioned Android devices. For Amazon, a shift away from smartphones may be the only way forward.


At the same time, we need a fresh approach to computing. Modern smartphone platforms have become a minefield of distractions, dominated by social media apps whose primary goal is to occupy ever more time. There are already signs that the smartphone-driven “attention economy” has detrimental health effects.

Developers are also looking for ways to break free of the smartphone business, which now has a glut of apps that people aren’t using. According to ComScore, the share of U.S. smartphone owners who downloaded zero new apps per month exceeded 50% in June, and nearly three-quarters of smartphone owners downloaded fewer than three apps in that period. A 2015 study by Forrester found that just a handful of tech giants dominate the time users spend on their phones.

“Peak app has already happened,” says Brian Roemmele, an independent tech consultant who specializes in voice and commerce. “The new iPhone is going to do great, and it’s going to change a whole lot, but it’s not going to induce more and more app downloads. So that economy has shifted.”


Alexa is Amazon’s chance to start fresh on both counts. Instead of trying to suck people in, Echo devices emphasize quick, purposeful interactions. The few in-depth use cases that do exist tend to focus on direct communication with others, for instance, through voice calls and party games.

“We’re trying to get people away from all the personal electronics and create more of a family, communal experience,” Amazon’s Daniel says. “So you’re not just looking down into your individual phones, and you’re actually collaborating with your family members.”

Meanwhile, developers see Alexa as a solution to the app discovery problem, partly because it’s new platform that isn’t yet overrun with offerings, and partly because users might become interested in third-party skills that they otherwise wouldn’t download in app form.


“Mobile behavior is already set with consumers typically using about five apps a day on a mobile device, making it harder for brands to get users to discover and download new apps on their devices,” says Ambika Nigam, Bloomberg Media’s global head of mobile app products. “However on Alexa, if you’re having a conversation and can organically weave your brand and content into the mix, it can potentially become easier for users to discover your skill.”

These conditions have already turned Amazon’s Echo speaker into a hit product, with an estimated 15 million sales, according to Consumer Intelligence Research Partners. But Alexa’s usefulness has limits with audio alone. That’s why the company is turning to touch screens as its next step.

The Echo Show introduced touch features to Alexa, but Amazon doesn’t want to overdo it. [Photo: courtesy of Amazon]

Avoiding “Temptation”

Amazon started planning for screen-equipped devices like the Echo Show and Echo Spot in early 2015, Daniel says. But since then, the company has spent a lot of time debating how to handle Alexa’s visual side. Ultimately Amazon decided to avoid many of the conventions–Daniel calls them “temptations”–of modern smartphones, such as an app launcher, a web viewer, and complex menu structures.


“Very often, you’re tempted to do something by touch, or fall into the older paradigm, and we often had to catch ourselves and not do that,” Daniel says.

Still, the Echo Show doesn’t strictly depend on voice. For instance, you can tap on the screen to read lightweight news summaries, view upcoming calendar appointments, and place video calls to trusted contacts. Daniel refers to these actions as “shortcuts,” and suggests that Alexa could offer more of them over time.

“I would say there’s no reason not to give them more rich content, or access to more sources of information, so that’s not necessarily something we thought of as, ‘We should limit the user,'” she says. “We just wanted to get the first experience out there.”


Adding more shortcuts to a device like the Echo Show isn’t a simple task, since scrolling through a long list of potential actions is no better than picking from a wall of app icons. For Amazon, the trick is to use context to figure out what types of actions to offer.

Daniel gives a hypothetical use case: If you have an Alexa-enabled alarm clock, it might display a “goodnight” button around bedtime, which would turn off all the lights, lock the doors, and adjust the thermostat with one touch. It’s the same action that you might trigger with a voice command–especially now that Alexa supports smart home routines–but tapping the screen might take less time and wouldn’t disturb your sleeping spouse.

It’s not hard to imagine other examples, such as music suggestions in the car, things to watch on TV after work, or household items you might need to restock. (This is Amazon, after all.) Because Alexa already understands how to perform these actions through voice, offering touch screen shortcuts is trivial. It’s just a matter of picking the right actions at the right time.


“Yes, we’ve used voice as an interface, but we’re making Alexa smarter about you, your personal preferences, the things you like to do, like to hear, all of that,” Daniel says. “So it’s possible that . . . you might not start off with voice, but you’re still interacting with Alexa, and you get the richness of Alexa interaction all through display and touch.”

Pitching A Paradigm Shift

Meanwhile, Amazon is trying to encourage more visual elements inside third-party Alexa skills. That way, users can accomplish more of the things that might otherwise require a smartphone, such as booking restaurant reservations, following recipes, checking stocks, and looking up travel information. But while Alexa now offers 25,000 third-party skills, Amazon won’t say how many of them are include visual features such as videos, illustrations, and on-screen buttons.

Paul Cutsinger, Amazon’s developer evangelist for Alexa, has been trying to drum up touch screen support since the Echo Show launched in June. Approaching developers can be tricky, he says, since Amazon is trying to push an entirely new interaction model. There’s a lot of temptation to fall back on old habits, such as displaying a visual list of search results with no spoken-word description to go with it.


“The first thing that seems to happen a lot of times is people will start to gravitate back toward building a very graphical-first experience, so I try to talk through how this is different,” he says.

For now, Amazon is keeping developers on a tight leash, allowing them to use just a few cookie-cutter layouts for images and text while emphasizing best practices for voice. But setting boundaries for developers isn’t as easy as setting limits for itself. Several developers say Amazon’s current rules prevent them from accomplishing their goals.

“I think it’s a good framework, and like all frameworks, it’s a great accelerator. It allows you to get started,” says Terren Peterson, a developer that Amazon has recognized as an “Alexa Champion” for his sample skills and tutorials. “The problem is, any of us who’ve ever built mobile apps and websites and the like, we’re used to wanting to have full control on the visual experience.”


On the Echo Show, for instance, Peterson has considered a skill for music lessons, allowing users to play along with sheet music as it scrolled across the screen. That’s not possible today because Alexa doesn’t support custom animations.

Huy Nguyen, a software engineer at ChefSteps, has also hit roadblocks while trying to build Echo Show support for the Joule sous vide cooker. If a user asks to see a medium-rare steak, for instance, ChefSteps can’t show spoken or written instructions alongside a video, because Alexa only allows videos to play at full screen.

“You could always add the voiceover into the video itself, but that takes production time, and you have to have a video editor actually do that,” Nguyen says.


Even basic navigation poses a challenge as Amazon tries to avoid smartphone concepts such as a back button and main menus for skills. This restriction makes sense for promoting voice-first interactions, but it also prevents basic actions such as going back to a previous selection screen.

Bloomberg’s Ambika Nigam says the company is still figuring out how to deal with this limitation. But for now, Bloomberg’s Alexa skill doesn’t support touch controls at all.

“If it’s voice first, you don’t want to rely on putting buttons out there, because that really makes it more video- and touch-first,” she says. “The last thing you want is people going up to this thing, and then continuing to press it, because then it’s just as good as an iPad or your phone.”


Perhaps the biggest dilemma Amazon faces is how to deal with notifications. Today, Alexa only allows alerts from its own store and a small number of third-party skills, all on an opt-in basis. But as the list of supported notifications gets longer, managing the flow could become overwhelming, just as it is on smartphones. In the long run, Alexa notifications will have to be smarter and more contextual.

Cutsinger says Amazon is moving slowly as it figures out the right approach.

“A lot of times, I talk to developers, and they’re like, ‘I want notifications! I’ve got all these ideas!'” he says. “I usually come at it from the angle of . . . how do we help users manage this and control this, so the random skill doesn’t wake you up in the middle of the night? That is our big consideration, and that’s why we’re being very thoughtful about the capabilities we roll out and when they roll out.”

Amazon envisions a world where people aren’t quite so fixated on their phones. [Photo: Flickr user Andri Koolme]

Alexa Versus The Smartphone World

Cutsinger cautions that it’s early days for the ecosystem, and it’s going to take time for Amazon and developers to figure out how things should work. Still, if Amazon can get over those hurdles and build an ecosystem of touch screen skills, it’ll end up in a vastly different place than some of its rivals.

With Siri, for instance, Apple is trying to graft voice controls onto existing smartphone apps. This is an inherently painstaking process, as it requires Apple to retroactively add support for every conceivable action users might want to perform with their apps. At the moment, Siri only supports voice commands for a handful of specific domains–such as ride sharing, workouts, and to-do lists–and app makers are unable to customize the types of actions they offer.

Meanwhile, Samsung plans to offer developer tools that will allow Android developers to map Bixby voice commands onto existing actions within their apps. The idea is that users will be able to seamlessly switch between touch and voice on smartphones like the Galaxy Note8. That sounds a bit like what Amazon’s trying to accomplish, except that it will require developers to put a lot of extra work into their existing Android apps. (Also, Samsung hasn’t released its developer tools for Bixby yet.)

By comparison, Amazon is wiping the slate clean, and creating a new system in which every possible action already has a voice command to go with it. The next step is to go back and accommodate touch as well. (Google may be pursuing a similar plan, but for now, the Google Assistant is only integrated into smart speakers such as Google Home and Android devices at the hardware level.)

“We don’t want to replicate the multilevel, multilayer touch actions that apps do. We want to be able to mimic voice by touch. It’s a different paradigm,” Amazon’s Miriam Daniel says. “If you design for touch and you voice-enable, it’s very onerous. If you design for voice, which is very simple, and you touch-enable, you actually simplify touch as well.”

That approach seems reminiscent of how Apple approached the iPhone a decade ago. By designing a new operating system around touch, Apple created a new breed of apps that were simpler than desktop software. The ecosystem was then able to scale up to tablets with the iPad, which Apple is now pitching as a viable PC replacement for many people.

The PC is still around, of course, and will be for many years despite the smartphone revolution. Likewise, smartphones won’t go away, even if Amazon’s wildly ambitious plan proves successful.

But if there was always an Alexa-powered display nearby, and it promised to be faster and simpler at getting things done, maybe you wouldn’t take out your smartphone as often and get so easily sucked in by an array of notifications, badges, and shiny little icons. Little by little, smartphone apps could lose their grip over your attention. For a moment, at least, the real world would prevail–and so would Amazon.