Making a PC generate sounds that resemble human speech is relatively simple. But making a machine sound convincingly human is very tricky. Yet IBM claims to have coded a synthetic voice that is the more similar to a human voice than any created before. Its level of human mimicry is so high that it even copies our errors by umming, erring, and even sighing. If your attention is needed, it can even gently cough, or shush you when you’re interrupting.
The newly patented tech has been dubbed “generating paralinguistic phenomena via markup in text-to-speech syntheses” and it’s designed for use on telephone automated systems, devices like GPS units, and possibly even cellphones. Its sophistication is in the “paralinguistic” part of its name: those gentle little quirks that make a human voice unique. As Andy Aaron of IBM’s speech research team says, “These sounds can be incredibly subtle, even unnoticeable, but have a profound psychological effect.” The system can pause for effect, react to situations by modulating its speech, and it can learn new affectations, which it will then place in the correct part of a sentence.
Intelligability has been achieved in synthetic voices for decades–check out the video, which shows the first ever computer singing (along with a classic clip of HAL)–but these speaking devices are obviously not human, which may incline you to distrust or dislike listening to them.
As our devices get ever more sophisticated, there’s significant scope for IBM’s sort of speech technology. Having a GPS unit gently cough and point out that you’ve just missed a turn, rather than a stern “Recalculating” message would be infinitely better. An automated telephone system that actually sounded less like a sterile machine would not hurt either.
The true test of IBM’s new voice is whether it’s better than the most famous computer voice of all: HAL 9000 from the 1968 movie 2001: A Space Odyssey. Although the voice was performed by an actor, Stanley Kubrick and Arthur C. Clarke saw fit to make the computer’s speech sound unmistakeably synthetic–HAL speaks in an unwavering calm tone, at a very metered pace, and doesn’t “um” or “er” or cough or hum. And that’s something IBM is being careful about too–it doesn’t want its voice to cross the uncanny valley of totally mimicing a person. “We are almost at the point where the voice is indistinguishable from a human, but that is not our goal,” says Andy Aaron. “We don’t want to fool anybody.”
[via The Telegraph]