Microsoft’s new universal translation system is called Monolingual TTS, and though it’s been in development for a while, its demonstration at TechFest 2012 last week generated plenty of interest. It’s not only technically impressive, Microsoft makes it extremely approachable, maybe even more immediately useful than similar services by Google. Instead of speaking in a synthetic voice, something like Apple’s Siri, it uses a digital recreation of the user’s voice, and a realistic rendering of their face.
This high-tech aspect means it takes a little longer to get the system used to a new user, with MS noting it took an hour to learn the voice of an English-speaking trainer for a Mandarin-English experiment, where the ultimate goal was to have the app speaking passable Mandarin in real time as the user spoke in their native English. It’s also clever enough to listen to responses, in say Mandarin, and speak them back to the user–enabling close to real-time communication, even if neither party speaks the other’s tongue (with a possible repertoire of 26 languages to choose from). And where it’s necessary to mix languages up, with place names appropriately pronounced among translated phrases, it knows to keep the words separate.
This real-time translation aspect is impressive enough, and it echoes what’s being done elsewhere with advanced language algorithms. But Microsoft’s team has realized that one of the most important aspects of speaking to someone in a foreign language is the non-verbal part of the communication. To this end, the system actually animates a 3-D scanned avatar of the user in time to the output speech, matching head movements and lip-syncing to the output sounds so that they give an illusion that the user is actually speaking the chosen language.
Google, for its part, has been developing its Translate app far beyond the powerful, yet very staid and sometimes amusing machine text translation it’s offered for some time. The Android version of app can manage 63 languages, but just 17 of them work in speech mode, and it seems like it’s more directly designed so that you can say something and then hold up your Android phone for the other party to read–in their language. Google’s just augmented the app to enable handwriting recognition, which is perhaps more useful than you may think–scribbling in characters you don’t understand in, say, Chinese is much simpler than searching through character lists.
Handy though this is, in the long run it would seem that tech like Microsoft’s system could be the “friendliest” one. Machine translation isn’t always reliable, and it’s easy to take mis-translated responses out of context–correctly animated faces could help with this. And a believable face on a remote conversation, which happens in real time over the Internet, would certainly help with this.
[Image: Flickr user Allan Foster]