A few months ago, Amazon quietly revealed a big piece of its strategy for Alexa. Beyond the Amazon Echo and other connected speakers, the company is looking to put its virtual assistant onto more devices that have screens. To that end, it announced new developer tools that would make those devices possible.
Now, we’re about to see the result of those efforts, starting with a connected intercom system called Nucleus. Although Nucleus has offered Alexa voice commands since the hardware launched last year, none of Alexa’s responses–from weather forecasts to product listings on Amazon.com–currently appear in visual form on the intercom’s 8-inch display. That should change within a month or two, as Nucleus becomes the first non-Amazon device to support Alexa’s display capabilities.
“It’s awesome,” says Morley Ivers, Nucleus’s cofounder and president. “This actually takes voice platforms to the next level, and allows you to have a much richer and more interactive experience.”
By enabling third-party gadget makers such as Nucleus to provide a sophisticated Alexa experience, Amazon has a shot at spreading its assistant into more parts of more consumers’ lives than it could ever do on its own. Still, rethinking Alexa with both voice and visuals in mind isn’t without its potential pitfalls.
Amazon has never hid the fact that it wants Alexa to be everywhere, and not just on its own devices such as the Echo speaker. In mid-2015, the company announced Alexa Voice Service, a set of tools for device makers who want to add the voice-driven assistant to their products. And over the last year, some of those third-party devices have started to trickle out. Nucleus, which launched in August 2016, was one of them.
For Amazon’s strategy of ubiquity, an intercom such as Nucleus had an obvious appeal: Customers may buy multiple units and put them in the parts of the house where they spend the most time, increasing the odds that Alexa is within earshot. Ivers describes the basic video intercom functionality as a “Trojan horse” for connected services, of which Alexa is one.
“Our average customer is putting 2.3 devices into their home on day one,” he says. “As a result, they’re getting placed in these strategically important locations.”
Nucleus is pricey for an Alexa device–individual units list for $250, and a two-pack costs $400–and the company notes that it’s seen $4.5 million in revenue to date, which translates to just 20,000 units or so. But sales are growing by 42% month over month, and 80% of Nucleus sales are outside New York City and San Francisco, suggesting the product has some appeal beyond the tech bubble.
More importantly, Amazon believes in the concept, having led Nucleus’s $5.6 million Series A funding round. The two companies have worked closely together to improve Alexa, Ivers says, with Nucleus helping Amazon develop hardware and protocols that could bring more Alexa devices to market. Nucleus is also involved in a program to reduce false detection of the “Alexa” wake word through cloud-based analysis.
“It’s definitely been the case that this has been a real partnership,” Ivers says.
The current Alexa experience on Nucleus feels a bit bare. Although Nucleus supports almost all the same functions as an Amazon Echo and offers a handy Alexa skill for launching intercom calls by voice, all Alexa responses are delivered solely through audio. When you talk to Alexa, the device doesn’t respond with a visual readout of weather, sports scores, calendar appointments, or Amazon.com listings. That can be frustrating, given that Nucleus’s speakers are weaker and tinnier than those of an Echo.
On Amazon’s own Fire tablets and TV devices, Alexa can already display information on-screen. But until recently, other hardware makers haven’t been able include those same capabilities. That’s about to change, as Amazon releases new developer tools called “Display Cards.” These allow third-party devices with screens to support Alexa in the same way as Amazon’s own Fire devices.
Ivers says Nucleus will be adding Display Card support within a month or two, right after it rolls out some other general product updates, such as improvements to screen brightness and audio. “We’ve got the green light from Amazon, and we’re just scheduling our timing with our upgrade builds,” he says.
Other products might not be far behind. During CES, Huawei announced that it would integrate Alexa with its Mate 9 smartphone, LG revealed a refrigerator with Alexa built in, and Ford said it would be the first auto maker with an in-car Alexa experience. Display Cards could also benefit wearable devices like the iMCO Watch smartwatch, which, like Nucleus, only responds to commands by voice at the moment.
Amazon may even light the way with some new hardware of its own, just as it did with the Echo. In November, unnamed sources told Bloomberg’s Mark Gurman that a premium Alexa product was on the way, with a 7-inch touchscreen and better speakers than the Echo.
For Amazon, expanding Alexa to more screens could introduce new challenges. Users, for instance, may start expecting all Alexa features to have a visual component, along with touchscreen buttons or other ways to control Amazon’s assistant without voice. It’s not hard to imagine asking Alexa for a product on Amazon, and then using swipes and taps to adjust the item’s quantity or view alternatives.
Developers of third-party Alexa skills might also start seeking more control over what they can do with Display Cards. Today, those cards are limited to plain text and a single image, and for many developers, that’s not enough. On Amazon’s developer forums, they’ve been asking for things like web links, richer formatting, and video, and those requests may become more urgent as Alexa appears on more screens.
For Amazon, this leads to an existential question: Should Alexa remain a voice-driven assistant, or should new uses emerge that are only possible with a display?
At least for now, Amazon positions Display Cards as “complementary” to the voice experience. And Ivers seems to agree. While he’d like to see Display Cards expand to present new kinds of information, such as maps, package tracking details, and messages from people in users’ communities, he still believes in the power of voice commands.
“Voice is the quickest way to summon information,” he says, “but visualizing it is the fastest way to absorb it.”