Interfaces and hardware shaped the last decade of smartphone innovation, but artificial intelligence will shape the next decade. Nowhere is that more obvious than in Google’s new Pixel 3 smartphones, announced today at an event in New York. The new devices use AI to do everything from answer your phone for you to taking clear photos in the dark of night.
With all respect to the industrial design, the Pixel 3’s hardware update is fairly typical. The phone is getting faster guts, the dreaded front notch, wireless charging capabilities, a lol-worthy “not pink” millennial pink option, and 5.5-inch and 6.3-inch sizes that start at $799. Let’s be honest: It’s a Pixel stuffed with everything you’d expect a smartphone to contain in 2018.
Instead, to differentiate itself in the market, Google is leveraging its greatest asset: Industry-leading AI.
“One of the most exciting stories we have this year is how much machine learning and AI we put into the product,” says Seang Chau, vice president of Pixel software. “I think it’s one of the things that allows Google to differentiate itself.”
The Pixel 3 is loaded with user-friendly AI superpowers, and, crucially, it’s not running all that AI from the cloud, but locally, from right on your actual device. That means the company can pull off more advanced features in real time, with less power consumption and more security. It’s the key to what makes the phone’s software different.
Companies like Apple utilize on-device AI with less fanfare. Most recently, Apple began using AI to spot you in iOS’s portrait mode, blurring the background of the image. It also uses AI to suggest the app you open next, building shortcuts into the sea of apps on your phone. But during my hourlong tour of the Pixel 3’s AI, it became clear that Google is going further than Apple. How? Google was already ahead of Apple in terms of cloud computing (case in point: Apple’s iCloud is built upon Google’s cloud). And now it’s shrinking much of that into the form of your phone.
The initiative to move AI to the phone itself started in a big way last year, before the announcement of the Pixel 2. Google developers were able to use machine learning to shrink its massive song-matching algorithm in a way that allowed it to “hear” any of its 70,000 songs, akin to Shazam, with a feature called Now Playing. The AI was tiny on your phone and consumed almost no power, turning a standalone app like Shazam into a clunky bit of obsolescence. Instead, you could just look down at your Pixel, and see the song you were wondering about on the lock screen.
Now Google is using the road map behind Now Playing to do the same for all sorts of new features. Take the new Screen Call tool. When anyone calls your Pixel 3, you can tap a button to have a voice chat assistant answer that call and screen it on your behalf. Your assistant reads a stock script, and asks the caller to identify themselves. Meanwhile, the software transcribes with on-device speech-to-text, presenting the information to you much like a text message. If you like, you can keep pressing for more info, by tapping on various, pre-canned options. You can even share that “I’ll call back later” or just report it as spam and block the number forever.
Screen Call is a perfect example of the benefits of running AI on device versus in the cloud. Whereas existing visual voicemail allows companies like Verizon to transcribe your voicemail messages for you, this process is on a bit of a delay that you, the user, have no real control over. With the AI in your hands, though, that assistant becomes software that works on your schedule–in real time–to deal with spammers.
Similarly, features from Google Lens–Google’s cloud-based image analyzing service–will now run on the Pixel 3. That means if you photograph a business card, Lens can see that there’s a phone number, or address–which can be called, or opened in Google Maps, respectively, with buttons that appear on screen.
It’s neat to watch happen in real time, but designing exactly how the UI reacts in these moments is tricky.
“Our general philosophy is we want to make sure technology is kept out of the way of the user so it’s not something they have to think about. Mostly, we’re not in your face about it,” says Chau. “With [Lens] suggestions, we wait until a QR code or phone number is X% of the screen before we recommend it. Even if we see the business card, we don’t recommend anything until we think it’s clear that’s what you want to do.”
Indeed, most of the artificial intelligence Google is introducing is within the Pixel’s camera itself, where much of the time, a user can either ignore its smarts entirely, or benefit from the effects while being none the wiser that they exist.
Top Shot is a new camera feature that ensures you get everyone smiling, eyes open, in frame every time. Essentially, it means your camera grabs frames before and after you tap on the shutter button–frames that are taken at a lower resolution than you’d want. But with AI, Top Shot not only analyzes your shots for all those aesthetic things we want in casual photography, but also actually combines image data from from the lousy high-resolution images you took with the content of the better low-resolution photos it grabbed as a backup. Software merges the two frames as one HDR image. The camera’s AI reconstructs a moment that it technically missed.
Similar image magic happens while zooming–and in low light. The Pixel 3 only has one camera on its back, and it lacks optical zoom. That typically means zooming would typically be done digitally by simply enlarging the pixels in a blurry way. The Pixel 3, however, recognizes that you’re zoomed and cross-analyzes the frame with your subtle, shifting movements. Each movement actually provides more pixel data to the sensor, and all these pixels are combined in a way that Google claims allows you to zoom 2x into an image without degrading your picture.
Likewise, the camera features a Night Sight mode that operates in a similar manner. When you photograph something that’s dark, it will stack several photos, combining all of the brightest bits, into one image that simulates a long-exposure image.
Formerly, image processing of this magnitude lived in Google Photos, online, where Google uses all sorts of AI to build a feed you might like of your photos, much like Facebook. Thus far, though, this feed is asynchronous rather than real time. That means while you’re sleeping at night, Google Photos will use AI to do things like combine many of your photos of your kids into adorable gifs.
On the Pixel 3, Google is moving these image enhancements into real-time territory. To do so, the Pixel team is borrowing and shrinking software technology from the Photos team–using a similar shrink-the-AI workflow to the way it got Now Playing running on the smartphone. The AI models behind these photo enhancements are trained in the cloud–which takes enormous processing power–but when complete, they can live on your device as software tools that are perfect for doing one job perfectly, like brightening a photo.
Where processing happens shouldn’t matter to users in theory, but practically, it makes all the difference. Most of the Pixel’s new camera tricks would be impossible if they lived in the cloud, because you couldn’t have the real-time feedback on screen that you needed. You couldn’t possibly upload photos as fast as your phone could take them, let alone wait for them to be processed, and re-download them. Google’s new Pixel AR features, for instance, will allow you to add Instagram-like stickers to your videos. But with AI, objects in the scene are identified in real time, reacting to context–a phone in the frame brings up a chat cartoon that says “call me!” Or you can bring in Marvel characters, like Iron Man, to pose for selfies with you, smiling or shrugging in concert.
“This doesn’t mean there won’t be great cloud use cases as well. But there’s always going to be latency, power, and data considerations when we’re talking about cloud services,” says Chau. “We believe there are use cases that it makes sense to run low latency, real time [AI] because it brings out a better user experience.”
Of course, there is a pretty big catch to running AI locally. It means you’re often collecting and processing tons of extra data on your phone–a device that’s inherently less secure than Google’s own servers. (That’s in theory, given the recent security breach in Google+.) Google assures me it’s not seeing data like the songs playing around you, in Now Playing. Similarly, that selfie with Iron Man will never be seen by Google, unless you back up your photos to Google’s servers. Local AI is a promising development for user privacy. But that doesn’t matter if the contents of your phone can be hacked by malware or other means–if, in theory, a hacker could hop into your phone and see everything the AI has seen.
“The more we do on the device, the more we’re going to have to protect what’s there,” says Chau. Google updated the Pixel 3 hardware in what looks to be an industry first–a security chip called Titan M that stores all of your passwords in a way that’s so protected that not even your smartphone’s CPU can see the data. This chip can also create the same two-factor login options that Google’s Titan Key password protectors use–meaning that the phone will also be able to securely unlock all sorts of websites, and potentially even Internet of Things hardware, in your life.
In a world where we’re increasingly dependent on corporations like Google keeping us secure–and those corporations are increasingly dependent on tracking our every move to be served up to advertisers–local AI is an enticing alternative. I’m not so naive as to think that this technology will allow me to use Android without being tracked, but by moving the AI closer to us, Google is putting a little more distance between our phones and its servers. Strangely, localized AI could help retain some aspects of personal privacy without us chucking our phones and moving to caves. At minimum, it should help with those Iron Man selfies.