The Problem With Sentiment Analysis

Sorting social media chatter into “positive” and “negative” buckets is so 2009.

The Problem With Sentiment Analysis
[Photo: Flickr user Demi-Brooke]

During the 2012 presidential election, USA Today ran a daily feature in which its technology partner, Topsy (since acquired by Apple), provided a sentiment score for each candidate. On May 1st, the first day the feature ran, for instance, Barack Obama scored 34 and Mitt Romeny scored 26. On November 7th, the day after the election, Obama scored 85 and Romney scored 57.


And on every day in between, Marc Smith, a sociologist who specializes in the social organization of online communities, cringed. “That is irresponsible,” he says. “That is remarkably poor data. That is borderline criminal, I would argue.”

USA Today did not, he argues, account for the fact that there were two separate groups tweeting: a group that supported Obama and a group that supported Romney. And as a result, what it reported was not actual change in public opinion from one day to the next, but rather which side was the loudest that day. “Imagine candidate 1 supporters and candidate 2 supporters are actually in a town square,” Smith says. “And the reporter visits the square once a day and measures the noise of each crowd, and reports that as an actual change in public opinion.”

This style of sentiment analysis has been applied not only to politics, but also to the Super Bowl, American Idol voting, and even war. But while measuring the sentiment in a sample of social media posts was once all most social analytics companies offered, it’s since become apparent that extracting meaningful information from social networks is more complicated than that.

Polarized CrowdsGraph: NodeXL Graph Gallery

Smith, for instance, argues the structure of a social network is critical. He helped build an open-source tool called NodeXL that graphs social networks and runs a consultancy that helps brands use it. According to a paper he co-authored and Pew Research published in February, crowds that form around a topic on social media typically break down into six shapes: a “polarized crowd,” in which two groups form around something like a political issue and rarely interact with each other; a “tight crowd,” in which a small group of people interact with each other around something like a conference; a “brand cluster,” in which people talk about something like a brand, but do not talk about it with each other; a “community cluster,” in which multiple small groups form; a “broadcast network,” in which many people connect to something like a media outlet, but not each other; and a “support network,” in which something like a customer service account interacts with many people, but those people don’t connect to each other.

Tight CrowdGraph: NodeXL Graph Gallery

By looking at the shape of a network, he says you could avoid the USA Today misstep of treating all social media posts the same and instead report on the size, volume, and content of each major cluster over time. A brand might also be interested in getting people to talk with each other about a new product than getting people to interact with its customer service account. Without knowing how connections work, there’s no way to know the difference.

More traditional social media analytics firms are looking more closely at the structure of social networks, too. Sysomos, for instance, uses an open source tool for graphing social networks called Titan to help pinpoint relevant influencers and clusters of communities for brands. Eventually, it plans to also use this information to better its sentiment analysis. “Somebody might say, ‘I really hate beer, but I love Heineken,’” says the company’s executive vice president of product, Brian Kissel. “You see the word ‘hate’ and the word ‘Heineken’ in the same tweet, do you now infer that they are negative on Heineken? Well, in fact they’re not. Part of that is parsing the text and proximity of words to other words.” People can naturally make those distinctions but a machine could use that person’s social network–beer lovers or haters–to help “disambiguate” those statements.

Click to expandGraph: NodeXL Graph Gallery

Already, social analytics firms have moved to supplement sentiment analysis with other metrics. Sysomos, General Sentiment, Crimson Hexagon, and Simply Measured all provide some sort of demographic information about who is posting about a topic or help pinpoint who is influencing an online conversation. Paying attention to “who” rather than just “what” can help these firm’s natural language processing engines interpret language–knowing a user is 14 rather than 40, for instance, might help interpret her use of the word “killer”–but it also provides their clients with more content around that analysis. All four firms also look for ways to represent why sentiment might be a certain way. Crimson Hexagon, for instance, provides a “Topic Wheel” to show the most popular conversations happening around a term. General Sentiment lists the most common adjectives that go with a particular term, to add more specificity beyond “negative” or “positive” (it also, like many firms, rates sentiment on a scale).

“I’d rather we be called ‘General Tonality,’ really, because just saying ‘Sentiment’ implies a yes-no/either-or scenario that really doesn’t play out in our data,” Asher Feldman, General Sentiment’s director of strategic analytics and communication, said in an email. He adds by phone, “In the beginning, everyone dived into sentiment analysis as a one-size-fits-all yes/no solution, and they quickly realized that saying yes or no wasn’t going to be adequate.”

Additional information like the most popular words in a conversation or who is influencing it rarely makes it into news reports based on sentiment analysis. And part of the problem with placing tweets in “positive or negative” buckets, as journalists (yours truly included) often ask data analysis firms to do, is that natural language processing is really hard to do without context. There are only 140 characters in a tweet, and many of them make up abbreviations or words that can be used to convey very different sentiments depending on location or age.

Take a recent analysis that sentiment analysis firm BrandWatch did for Newsweek around Gamergate, for instance. In an effort to add data to claims that Gamergate is actually about journalism ethics, Newsweek’s reporter tried to pinpoint whether more negative #gamergate tweets were directed at game developers or journalists.

The problem was that, as Andy Baio later pointed out in a Medium post, about 90% of those tweets couldn’t automatically be placed in either category. When Baio read and sorted a three-day sample by hand, he found that about 90% of tweets did take a clear positive or negative stance.

Of course, classifying social media posts by hand isn’t practical at scale (though some firms do manually classify samples to make their algorithms better). But a simple thumbs up or down as “sentiment” is worse than meaningless–it’s simply not true.


About the author

Sarah Kessler is a senior writer at Fast Company, where she writes about the on-demand/gig/sharing "economies" and the future of work.