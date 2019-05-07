Ten years ago, YouTube began to auto-caption every video uploaded to the service. With 20 hours of video uploaded every minute at the time, it was a task for speech-recognition technologies of unprecedented scale. Artificial intelligence has progressed even more since then: Starting this year, the new version of Android, dubbed Android Q, will caption anything on your phone. That includes podcasts and videos from Facebook and Twitter. And it doesn’t need servers; it just needs your phone.

Live Caption, it’s called, doesn’t use the cloud. No data leaves your phone, and it can even work in Airplane Mode. Much like we’ve seen with Google’s music identification service (which identifies 70,000 songs) and Night Sight photography (which can basically see in the dark), the technology uses shrunk-down machine-learning algorithms to run right on your device.

Even though most every service allows creators to manually caption their videos, it can be laborious to do so. As a result, many videos aren’t captioned at all. Similarly, podcasts are rarely transcribed, and personal videos that friends share via text never feature closed captioning. With Live Caption, a world of otherwise inaccessible content will be made available to the deaf and hard-of-hearing community.

The project was born out of Google’s Creative Lab, which invited KR Liu, an advocate for the deaf and hard of hearing, to the office. “We didn’t have an idea. We brought her in, said let’s talk about the community, and workshopped things,” says Robert Wong, VP of Google Creative Lab. The lab has since dubbed this wider initiative Start with One. “You start with one person, don’t even try to solve their problem, but get with them, design with them,” Wong explains. “It’s not user testing. It’s more like, ‘You have a different take on the world, a different experience. What’s tough in your life? How do we solve that?’ It’s designing with, not designing for.”

What Wong describes is almost a textbook definition of inclusive design, or bringing in people who are considered edge users of a product to spearhead design and development. Somewhere early in the process, the Lab landed on a big idea born from the process: “We were thinking, if YouTube could caption every video, why couldn’t we do that for every piece of content on your phone?” says Nicole Bleuel, team lead on the project with the Creative Lab. Captioning would be wonderful for the deaf community. It would also be handy for anyone who was using their phone somewhere without sound.

Of course, there were reasons why Google couldn’t easily caption every piece of content inside Android. While the Pixel currently has features like call screening, which uses AI on the phone to detect and transcribe what someone on hold is saying, to caption everything on the device requires the Android team to recode some fundamental bits of Android’s audio architecture.

Beyond that, there were big questions of what closed captioning on a phone would even look like. On television, where it began in the 1970s, closed captioning is pretty straightforward. There’s only one constant video stream that takes up your whole screen–so sticking it near the bottom generally works. On mobile phones, every app interface is a little bit different. Where could these captions float without getting in the way?

At first, the team mocked up something akin to Chat Head, a late UI from Facebook that is used in some Android functions. It’s a floating button that you could activate in the settings and tap when you needed to translate audio to text. The team shared the idea with designers who were deaf and hard of hearing, and they were remarkably receptive to it. “Even though I don’t consider this to be an accessibility feature, I’d rather start by building it for the people who need and want it the most,” says Bleuel. “That’s how you get to the point to make something universally useful and accessible.”