Twitter is largely a public forum, and most users know this. If you tweet about something you want to keep private, you’ve failed. But what many people don’t realize is that simply using Twitter at all can reveal a lot more than intended.
Data scientists are becoming adept at making predictions about people based on very subtle clues on public social media, such as language used, friend networks, or topics a user talks about. By looking for patterns in endless streams of Twitter data, experts have developed ways to predict a user’s age, gender, location, personality, political leanings, and whether they are depressed–even when that information isn’t explicitly given.
“It’s not even just what you say and how you say it: your connections to friends and followers also reveal a lot about you, so even if you’re 100% silent there is still a great deal that can be inferred about you,” says Christo Wilson, a computer scientist at Northeastern University who has studied the effects of algorithms used to judge consumers online.
The newest advance on this front, published in the journal PLoS One in September, shows how it’s possible roughly ballpark a Twitter user’s income.
In the study, researchers looked at about 5,000 real Twitter profiles that clearly described the person’s job, whether that job was a tech executive or a coal miner. Based on the U.K. government’s job classifications, they matched a person’s job to its average salary and looked for Twitter use patterns that would help predict each salary. After creating a predictive model, the researchers could then estimate the salary of unknown users with high accuracy–enough to say one user was in the top 5% income bracket and another in the top 20%.
The team did the study because they’re interested in making more demographic data available for social science research. But these same kind of predictions could be useful to marketers, data brokers, employers, and government surveillance agencies.
“Everything people post is public, and they should be aware of how much information companies can find out about them,” says Daniel Preotiuc-Pietro, a researcher at the University of Pennsylvania and the lead author of the paper.
A report by the Federal Trade Commission last year showed the data brokers–the firms the collect detailed and often troubling profiles of individuals to sell to marketers–already include social media use and followings as part of their analysis. It’s not a far step for them to mine Twitter more deeply for more detailed predictions: “We don’t know what companies are doing, but what I’d expect is that if they aren’t doing this already, they will be doing it soon,” Preotiuc-Pietro says.
Researchers have been doing linguistic analysis for decades–based on letters, memos, and now emails–but the scale of social media has made this practice much easier to develop and deploy on a large scale. In the PLoS paper, the researchers found that higher income users tend to express more “anger” and “fear” in their tweets, but less emotion overall. Lower income users, on the other hand, curse more and are more optimistic. They also include fewer links in their tweets–their 140 character musings are more personal.
Does this mean that being rich causes you to be more angry? Of course, this study doesn’t prove this–it only shows that expressing anger specifically on Twitter and earning more money go together. In another example, a public tool developed by James Pennebaker, a University of Texas psychologist, will allow you to analyze your own personality based on Twitter (Preotiuc-Pietro says he is working on a similar tool that will predict anyone’s income). Just put your twitter handle into the AnalyzeWords site, and it’ll spit out something like my results seen above.
Why does this matter? Aside from caring about your privacy for the sake of it, knowing details like age or income does make discrimination easier, whether applying for a job or even simply shopping online.
“There are lots of uses for this data, but one of the big ones is market analysis: who is tweeting about my company/product, what is the sentiment of the conversation, and what are the demographics of these people? From there you start moving into influence strategies (how can we influence the conversation?) and segmentation (dividing people up by their attributes),” Wilson said in an email. He notes that this is likely the rationale behind Twitter’s purchase last year of Gnip, a company that supplies raw feeds of social media data to other companies.
And as algorithms make more and more automated decisions, the results will become less transparent to the public. Am I highly arrogant, as my Twitter personality seems to project? I’d like to think I’m not, but if some software out in the world is judging me that way, I’ll never know.