Google’s Voice Search, which launched on cellphones in 2008 and was added to the desktop in June, seems like such a simple proposition. You speak your query into your phone (or computer), and, ta-da, the system pops out an answer.
But teaching Google’s voice bot to understand what users are saying isn’t simple at all. And if you’re trying to get it to speak all the languages in the world, it’s even more complicated.
Enter Linne Ha (pictured, right), Google’s Voice Hunter. Her official title is "International Program Manager, Google Voice Search," but Ha spends her days crisscrossing the globe, gathering the voice samples needed to train the voice bot, the way a lepidopterist might go hunting for rare butterflies.
Typically, a company would solve this problem by licensing samples from firms that specialize in assembling speech databases. But that wasn’t going to work for Google.
Many of the standard lexicons simply don’t include the kind of words people use in search queries. (And Google has found, interestingly enough, that the words people use in voice searches are pretty much the same as the ones they use in written ones.)
Plus, Google needed their bot to understand queries spoken in all the settings that someone might possibly use Voice Search (or voice input and Voice Actions, two other Android features that also use the acoustic models created using the samples). Traditional firms weren’t set up to handle those situations.
So Ha had to get scrappy. Her solution: Tap local Google users around the world and hand them Android smartphones loaded with a specially designed speech-gathering app. Then send them out into their communities to record their family and friends.
“The local experts are people who would be using our products,” Ha tells Fast Company. “We want to make sure that whatever we develop is something they would want to use.”
The program, dubbed “word of mouth,” started last year, when Ha, who previously worked on Google Maps and Google Earth, developed the project. It’s taken her everywhere from Mexico City to Hanoi, Amsterdam, and Jakarta, and kept her on the road for over 230 days a year.
So far, “word of mouth” has collected “millions” of samples, Ha says—Google won’t get more specific—including a whopping 250,000 utterances for each language or dialect.
To make sure Google’s scientists get the range of samples they need, Ha’s local teams have gotten creative about where they do their recordings. In Hong Kong, they jumped on trolleys and subways, since so many people there use their phones during commutes. In Brazil, they went to shopping centers, in Singapore to soccer matches, and in the Netherlands to the beach.
And while Ha has had to tangle with everything from power outages to typhoons, she says it hasn’t been a problem finding locals to participate in the program. In Indonesia, they put out a call for volunteers to show up at a university, and over 900 people turned out.
“People are really proud of their language,” Ha says. “They want to make sure [Voice Search] works properly and that they can use it with their native tongue.”
So far Voice Search works with 27 languages and dialects, which means it has about 273 more to go just to support the 300 languages in the world that have more than a million speakers. At the current rate, it could take another Ha another decade to collect all the necessary samples herself. So instead, she’s looking to scale the program by partnering with organizations on the ground, like universities, to do some of the voice hunting on Google’s behalf.
In the meantime, Ha is getting ready for her next trip, which will take the program to Africa for the first time. But before that, Ha has a little vacation planned. Her destination? She’s staying home.
[Top Image: Flickr user jaredpolin]
[Additional Images: Ha in Iceland (top), Voice collections in Jakarta (middle) and Buenos Aires (bottom), courtesy Google]