When Louis-Philippe Morency was teaching his first course in analyzing human behavior at University of Southern California, he was scrounging to find good videos of people talking to each other and expressing their opinions. Then he had a lightbulb moment: YouTube.
Suddenly "millions of examples" of individuals opining on topics ranging from beauty creams to baseball were available to analyze—for free. Forget stilted focus-group sessions. "What is really amazing is all these people are talking straight to the camera with limited background noise and describing how they feel and what they like," he explains.
For Morency, whose work as a scientist at USC’s Institute for Creative Technologies (ICT) focuses on teaching computers to identify and understand the ways people convey emotion, YouTube’s treasure trove represented more than a way to serve up data sets to his students. It has the potential to advance the growing industry of opinion mining beyond the hunt for insight amidst text-only Amazon product reviews and Facebook status updates. "We are taking this field one step further by focusing on online videos, which provide verbal and non-verbal communication clues beyond just words," says Morency.
To do this, Morency and his colleagues combed through hundreds of YouTube videos featuring full-on diatribes, such as those from 20-something misanthrope Kingsley trashing everything from small birds to aggressive drivers. A proof-of-concept data set of about 50 videos was then fed into a computer program Morency developed that overlays graphics on the speaker to hone in on language, speech patterns, and facial expressions to determine the type of opinion being shared.
This small sample trumps studying text alone (emoticons notwithstanding), says Morency, because it offers clues into the less obvious nuances of communication. Beyond obvious squeals of delight, Morency finds that people look at the camera more when sharing a positive view and their voices become higher pitched. They start to use a lot more pauses when they are neutral. And "a genuine smile is in the eyes," he adds. There are also plenty of examples where an expression (either elated or upset) is perfect, but if it dissolves too quickly, Morency cautions, it may not be authentic.
Imagining the implications of detecting faux emotions, Fast Company put Morency through the paces with the newsreel clip of "Sweet" Brown." Her dramatic retelling of a narrow escape from an apartment fire not only prompted chuckles from the camera man and a slew of views (now 3.7 million and counting), it was subsequently turned into an Internet meme with the help of autotune. But was Brown really running for her life sans shoes?
Morency confesses he’s no expert on deception, but from where he sits, she’s genuine—and an accomplished (or perhaps intuitive) storyteller. "She keeps a strong gaze at the beginning to be sure to get the attention of the interviewer. She then emphasizes the important cues of her story using head shakes and eyebrow raises as "beat" gestures, synchronized with her words. And she transforms this relatively sad story in a fun story by smiling strategically half-way through." He says he’d need to see her speaking in real life to make a better assessment, but according to another report on KFOR, his impressions are correct: She’s all that.
After taking his model on the road to the International Conference on Multimodal Interaction in Spain, Morency expanded the dataset to include close to 500 videos and will submit results from this larger sample for publication later this year.
Far from residing within academia or remaining a fun party trick, Morency’s findings have caught the attention of top brass at YouTube as well as Google. Morency is currently writing a grant proposal to collaborate with the search engine giant, but maintains he’s "really about the research and creating the technology." After that, he says, the companies can work with his team to "make an impact with implementation."
Though the research component sets his science-oriented heart aflutter, Morency sees practical applications running the gamut from offering psychologists new tools to interpret behavior to offering businesses and consumers better ways to crowdsource. "Every time we want to go try a restaurant or movie, we go online and somehow trust the crowd," he explains. Adding visual video analytics can further winnow opinions by bringing the most genuine to the top. Likewise, he sees online searches curated not just by key word but by positive and negative takes.
Corporate beta testing could happen in as little as three years, but ICT is already using it in their research prototypes of virtual humans to perfect the way they interact with real people.
The French-Canadian native is also excited to begin studying videos featuring non-English speakers. Starting with Spanish, Morency is looking at how people from different countries express opinions based on their cultural background. "We can contextualize videos and start categorizing them to see the difference in what behaviors come naturally." Cross-cultural leadership training, anyone?
And speaking of leadership and working better together, Morency’s work is also focused on helping students working in groups to recognize the dominant person as well as detect if the students are engaged, confused, or in synch with learning. Measuring those variables could create tutoring systems that adapt to give more advice and positive reinforcement where needed. Far from fostering solitary tapping of keyboards, Morency is convinced that this technology would make homework interactive and encourage classroom collaboration.
"We are training the next generation," says Morency. "Good schools should be able to do this early on in their curriculum. I am really seeing it as a collaboration with different companies to make it happen. I want to innovate—but I also want to have an impact."