When a startup called Beyond Verbal began mapping the patterns and frequency of voices expressing various emotions, they noticed something odd: Love and hate looked almost identical. Yet despite the apparent similarities, our human minds can easily grasp the difference, and now machines are learning how to do it, too. In May, the startup revealed a technology that it says can identify hundreds of emotions in real time based on a voice alone.
The key to algorithmically deciphering the difference between love and hate? "When you hate someone you really push the air and speak out," says Beyond Verbal CEO Yuval Mor, demonstrating the huffy tone of hatred. Then he switches to a kinder, cooing quality: "When you love someone, this is the ‘pull’ kind of method." While to human ears the two tones are obviously different, on paper, their sound patterns look quite similar—except for these "push" and "pull" qualities, which the company built into its algorithm.
Beyond Verbal’s technology is based on research that two PhD students, Yoram Levanon and Lan Lossos, began in 1995. At the time, they were investigating emotion’s role in decision-making. That developed into gigs as advertising consultants, and eventually, after realizing the connection between intonation and emotion, they started a business that improved call centers by identifying caller emotions. That company went into liquidation, but others saw opportunities for the researchers' data-driven emotion identifier beyond the narrow scope of call centers.
If, say, Apple’s Siri understood how you were feeling in addition to what you were saying, it could pull up not just a playlist, but rather a playlist that matched your mood. Politicians could use the technology to practice enhancing qualities such as leadership in their voices while giving speeches. People with Asperger's syndrome, who often have communication difficulties, could use it to understand verbal cues that extend beyond literal words. It could even help air traffic controllers identify when pilots were under stress.
Since a tiny startup couldn’t pursue all of these options at once, the researchers helped start a new company, Beyond Verbal, that focuses on making their technology available for developers to use in their own apps. The company has not yet announced its first customers.
During a demonstration in Fast Company’s offices, VP of Marketing & Strategic Accounts Dan Emodi played a video of President Obama talking about Mitt Romney during the 2012 presidential election. Beyond Verbal’s test app, Moodies, identified emotions in his voice like "provocation" and "cynicism." While listening to Princess Diana talk about her troubled marriage in a 1995 interview with the BBC, it noticed "love," "grief," and "feeling of loss." You can try it on your own bit of monologue here.
The app is not attempting to understand what you say, but how you say it. Researchers based the technology on an analysis of more than 1,000 speeches and questionnaires from several thousand study participants who listened to sound clips and chose corresponding emotions. Beyond Verbal has tested it by asking about 60,000 people online in 26 countries whether the machine had correctly paired sound clips and emotions. That research is continuing, says Emodi, and humans agree with the machine’s analysis 75% to 80% of the time, with success being slightly lower in countries such as Vietnam that speak tonal languages.
Is a machine that understands emotions creepy? A bit. Will politicians use it to practice their false apologies until they sound genuine? Probably. But, argues Mor, you could just as easily use the technology to practice sounding confident as a leader or loving as a parent, and through practice affecting these qualities, a la the "make it till you fake it" method, you may actually achieve them. In any case, you’ll be more in tune with the importance of how you say things. "It’s not that you are necessarily [just] talking differently," Mor says. "You’re listening. You’re listening to yourself."