The constantly buzzing Apple rumor mill says Apple is going to embrace Nuance’s clever voice-recognition tech to power its future iPhone OS. An as-yet-unused North Carolina data center will play a starring role by hosting all the code in the cloud.
Last week rumors began to swirl suggesting that Apple and leading voice-recognition firm Nuance were entering into some sort of partnership to bring smarter voice recognition and control powers into a future edition of iOS for iPhones and iPads. Now these rumors are crystallizing into what sounds a lot more like believable facts, and they explain what Apple may be using its huge, largely unused, North Carolina data center for. It’s all about speech.
The partnership is due to be revealed officially at the World Wide Developers Conference in June, and ahead of the news Apple is said to be already using its data center to process Nuance’s data for experiments. Mobile users will be able to upload snippets of sound and get text or commands sent back down to their iDevices–hosting it this way keeps the data out of competitors’ hands, and lets Apple build on the core tech with its own systems. It’s almost exactly how Google runs its mobile voice-recognition systems, which access Google’s cloud-computing power for far swifter processing and access to a larger voice-sample database than would be possible in an app on a device.
Why’s this key though? It all comes down to Nuance’s most recent iPhone app, called Dragon Dictate. It’s built on technology acquired through a complex acquisition process in the run up to 2005 (when Nuance was rebranded from its earlier incarnation as Scansoft) that powered Dragon Naturally Speaking for Windows, and Dragon Dictate on Macs–one of the leading speech-recognition solutions for either platform.
The iPhone app may be the pinnacle of Nuance’s implementation, because it relies on a network-centric system that wouldn’t have been possible before. The idea is that the app is free to use, and when you send it a snippet of speech to decode, it gets whisked away to Nuance’s servers to be compared to other samples via algorithms–but Nuance actually keeps everyone’s anonymized voice files and uses them to hone its code in an independent process. The app works well, and requires no “training” to detect your own unique voice patterns in the way traditional voice recognition code (including that made by Nuance itself) does.
In some aspects, Android’s voice-recognition tech is cleverer than Apple’s, because it can leverage Google’s vast data warehouses, and one way this expresses itself is in voice control throughout Android. Nobody’s particularly raved about the facility (except us, when it first arrived), possibly because its implementation isn’t flawless, and the various flavors of Android and handset-specific UI overlays can obscure some of Android’s core powers. But we know Apple’s moving to try to match or surpass Google’s offerings–and a clever, “it just works” voice recognition system integrated deeply into the iPhone and iPad could really boost the device’s attractiveness to consumers. It could even increase the appeal to enterprise users looking for fast and efficient text-to-speech systems. With a flight of fancy, you can even imagine Apple integrating voice control with its rumored new navigation systems–which it could then promote with a PR campaign that highlights how easy it is to use, and how much safer than jabbing at a touchscreen while driving.
If Apple is indeed hoping to use Nuance’s tech, integration of a speech recognition database and its North Carolina data center makes perfect sense. Better yet, it has us pondering on the upcoming cloud-based revamp of MobileMe–which is also likely to rely on Apple’s server farm. Could voice control and voice recognition get even more useful by making MobileMe voice-centric?