Some of us have husbands, or wives. Some of us have partners. Some of us have roommates. But most of us have an AI these days, whether it’s Siri on our iPhone, or Alexa piping in from an Echo in the kitchen.
How do you communicate with your AI as intimately as you do with your wife, partner or roommate? It’s a question that companies like Google, Apple, and Amazon have been wrestling with for the last few years. But it’s Facebook’s Mark Zuckerberg who has the best answer so far: by text.
At least some of the time.
In an exclusive article yesterday, Fast Company‘s Daniel Terdiman gave us a tour of Facebook founder Mark Zuckerberg’s self-called “Jarvis”: a complete home automation AI, the creation of which he set for himself as his 2016 New Year’s Resolution.
The parameters of what we’ll call “Jarvis”–with deference to Marvel Entertainment!–are pretty straightforward. As the Facebook founder told Fast Company:
It’s not a production system that’s ready to go to other people. But if I couldn’t build a system that can at least do what [Echo and Home can], I probably would have been pretty disappointed in myself.
Those systems, Google Home and Amazon Alexa, let users to control anything compatible in their homes–their lights, thermostats, security systems, entertainment centers, and more–via voice control. Which is all pretty standard: Apple’s trying to do the same thing with Siri and HomeKit.
But Zuckerberg’s key observation isn’t that people want to be able to control their homes by voice. It’s that they also want to be able to control their homes by text. As Terdiman wrote:
Speaking to Jarvis and having it talk back makes sense for playing music. (In the demo I got, Jarvis speaks in a garden-variety synthesized female voice not unlike that of Siri or Alexa; Zuck is on the verge of getting a well-known person to provide custom vocals.) But Zuckerberg found that in many other cases, text was more desirable, especially when there were other people around.
“If I’m letting someone in at the gate . . . that’s not relevant to the people around me,” [Zuckerberg] says, ” so I’d much rather just text it.”
Right now, companies assume that voice will eventually be the OS of home automation. They envision AI as a kind of invisible roommate servant: someone we nag out loud when we’re a little too cold, or we want someone to press the buzzer, or to put on some different music. And when we want to command them, we speak our requests out loud, into the air, waiting for them, with technological servility, to hear and understand us.
That’s fair, but it’s half the picture. The key insight Zuckerberg has is there will be times we won’t want to ostentatiously pronounce our commands. There will be moments of quieter command. Moments in which you are conversing with your dinner guests, and your doorbell rings. How would you prefer to hear it? An earsplitting din that disrupts everyone, followed by the calamity of bodies flowing toward the door? Or an insistent buzz on your phone, which you can respond to with text-like efficiency?
Self-evidently the latter. The truth is, taking an AI into our home is a partnership, just like any other. And we don’t always communicate to our partners by voice. If I am sitting on the couch with my wife, yes, we talk to each other, but especially in the presence of others, we’ll text each other, because text is the invisible UI in our household. It’s the hidden layer we use to tell each other to turn down the heat, to check the door, to change the music, or–yes!—sometimes send each other sweet nothings when others are in the room.
That’s what Zuck has built himself. His Jarvis isn’t the smartest AI on the block, by any means. Terdiman’s article shows multiple examples of Zuckerberg’s AI failing to understand the Facebook founder’s voice commands. But that would be true of Siri, Alexa, or Google, too. What Zuck recognized is that there are times it’s simply weird, or inappropriate, to bark out your commands into the air.
Obviously, this all beelines with Facebook Messenger’s chatbot aspirations, announced in 2015. But Zuckerberg has wise words on why a voice-only command line doesn’t yet work with AIs:
If you train a machine learning system on data from Google of people speaking to a search engine, it will perform relatively worse on Facebook at understanding people talking to real people. In the case of Jarvis, training an AI that you’ll talk to at close range is also different from training a system you’ll talk to from all the way across the room, like Echo. These systems are more specialized than it appears, and that implies we are further off from having general systems than it might seem.
In other words, people talk to each other using a different syntax from the ones we use to speak to AIs. It’s a chicken-and-egg problem: until we start to speak to AIs like people, AIs won’t have enough data to understand how to talk to people like people.
In the meantime, text messages—especially in situations of ambiguity–make the most sense. We’ve all spend the better part of the decade communicating with Google’s text-like search interface. We know how to text to machines, and the very medium of text trains us to enter into a more machine-like syntax. But talking to machines? Spend a few minutes faltering with Alexa, and Zuck seems more right by the second. We’re just not ready yet. And because of that, AIs aren’t either.