How Facebook Turned The Social Graph Into A Hacker Alarm System

The same technology that helps Facebook track friendships and likes is powering a mutli-company effort to stop spammers and scammers.

How Facebook Turned The Social Graph Into A Hacker Alarm System
[Photo: Flickr user Thomas Leth-Olsen]

If there’s one thing Facebook is good at, it’s getting people to share their data and using it to track the relationships between people, places, and things. Knowing which of its 1.39 billion monthly users are friends with each other or have friends in common helps Facebook learn, for instance, what kind of updates and ads are going to keep them clicking and tapping through their news feeds.


And as one of the highest-profile online properties in the world, Facebook also needs to keep track of which of those users are trying to scam, spam, or hack one another, sabotage or slow the site, or use it as a conduit for viruses and spyware, and what tools, servers, and sites they’re using to do so.

To do that, the company realized that the social graph—the approach that helps it keep track of which users went to the same college or share an interest in Kanye West—can also help it more rapidly identify dubious postings and users, and determine which shady messages pushing malware came from the same ring of hackers.

Earlier this year, Facebook announced it had invited other web companies, including Tumblr, Twitter, Bitly, and Dropbox, to share and swap information about common threats to their systems and networks inside an industry-wide database, using the same Facebook Graph API system they use to post and access data from the Facebook network itself. “The notion of using a graph database is somewhat a bit of a lazy decision by me, in that we use similar technology or the same technology at Facebook to represent the entire social graph,” says Mark Hammell, manager of the network’s threat infrastructure team.

The initiative, called ThreatExchange, originated during a heavy spam-driven malware attack on Facebook last year, when Hammell decided to contact other companies to see if they were experiencing similar attacks. Already, the approach has shed light on security threats that Facebook had not previously discovered, Hammell says, including a “polymorphic malware family”—different versions of the same malware that had spread to numerous computers—that it found “pretty much immediately after we turned that technology on.”

Facebook would not disclose details or statistics about cyberthreats. But in an April Securities and Exchange Commission filing, the company said that fraudulent or “undesirable” accounts may have represented “less than 2%” of its monthly active users, including both outright spammers and other terms-of-service violations, like business pages incorrectly registered as individual users. These accounts, it noted, tended to originate in “developing markets such as India and Turkey.” In 2013, security researchers estimated that Facebook spammers were making around $200 million per year, based on prices found in spammer forums.


Conrad Rushing, director of engineering at Tumblr, which helped Facebook build ThreatExchange, says its utility “was obvious from the very beginning.”

“The thing that has been really excellent about it has been that the turnaround time of new concerted action—our counterstrategy to new spammer strategies—has taken far less time, at least half the time,” he says.

A malware entry from ThreatExchange, as viewed in a web UI, that one of their contributors submitted to the project.Facebook

Spammers and hackers often try to take advantage of multiple social networking sites and infrastructure providers to spread spam and trick users into downloading malware, says Rushing.

“All of the different social media groups were being used against one another as points on a long chain that a spammer would use,” he says. “A Facebook page would be created pointed to a Tumblr page pointing to another social media page eventually ending up as some sort of fraud or abuse or other bad action.”

ThreatExchange evolved from an internal system called ThreatData, which Facebook announced last year had helped the company quickly spot and quash malware and run automated analyses of where malicious content was coming from, and who it was affecting, using its existing data-crunching infrastructure.


A graph—the term programmers and mathematicians use for a representation of the nodes and connections in any kind of a network—can store relationships between viruses, the Facebook accounts used to disseminate them, the other Internet domains they use to circulate them and the like just as easily as it can store the connections between users.

Example submissions by Facebook and Company A to ThreatExchange. Company B then connects two malware samples previously submitted by Facebook and Company A to a common IP address.

But the same type of clustering algorithms that suggest two users should be friends, or that a user who likes Kill Bill might also like Pulp Fiction, can also help detect when two security threats have a common cause.

“You have a piece of malware talking to a domain—that’s two nodes with an edge between them. That domain’s hosted on an IP [address]—maybe that IP has hosted other domains,” says Hammell, describing the detective work involved in identifying threats. Graphs are also a familiar enough family of data structure that security experts outside the company can quickly grasp how the system works, he says. “The interest and engagement in ThreatExchange from a range of industries has far surpassed our expectations.” (Other companies can apply to join the network through an online form.)

Across the web, attacks by hackers and spammers are on the rise. A survey by PricewaterhouseCoopers conducted last year found that the number of detected information security incidents has grown 66% year over year since 2009. By 2019, breaches will cost global industry some $2 trillion, Juniper Research said in a report this month.

“One Person’s Trash Is Another’s Treasure”

While sometime competitors like Tumblr and Facebook have been willing to work together to help keep their users safe, until recently, the Internet giants’ security teams didn’t have a good way to share data beyond tried-and-true but labor-intensive means of interoffice communication, like email and shared spreadsheets, says Rushing of Tumblr. “That was really the genesis of the idea for having a more formal system for exchanging this kind of [data].”


Cooperation, even between long-time competitors, has become an emergent theme in cybersecurity efforts. Earlier this month, Intel Security partnered with a California-based company called AlienVault to offer a similar real-time collaborative threat assessment and prevention service for small- and medium-size businesses.

At a cybersecurity summit in January, President Obama proposed a similar data-sharing program that would connect industry with government agencies to stem cyberattacks. “This has to be a shared mission,” Obama said. He suggested that sharing threat information in such a system could come with “liability protections so that… they’re not vulnerable to future lawsuits,” although Congress has failed to pass similar legislation in the past four years. The CEOs of Facebook and Google, who have sought to distance themselves from government surveillance initiatives, declined to attend the summit.

Hammell says that ThreatExchange includes privacy safeguards, so member companies can decide what data they want to keep private and what data they want to share with other companies or industry groups. And some internal data from various networks and organizations won’t be uploaded into the system at all, since it includes too much private information about users. “We’ll be continually making enhancements to the privacy code to support more types of sharing models that different organizations want to support,” he says.

The network keeps track of what organization reported a particular site or link or which user is malicious or benign, and the confidence it indicated in the assessment. In the original version of ThreatExchange, Hammell says, Facebook tried to merge all of these opinions together to provide a unified view of what the system thinks of particular threats.

But that proved difficult, since different organizations might have a different view of, for instance, what kind of content constitutes spam. “One person’s trash is another person’s treasure,” he says. “If we try to make that condensed decision, we’ll probably end up just end up annoying a lot of people ’cause it’s us Facebook making that decision instead of letting everybody decide for themselves, which we think it probably the better way to go.”

A global map where one shade reflects the combined volume of both malicious and victimized IP addresses in one view. The inset pie chart breaks out U.S. IP addresses by ISP. Maps like this, which an analyst can build in under a minute, are used by Facebook’s security teams to illustrate where threats originate. Facebook

Ultimately, says Rushing, there’s an ongoing “arms race” between digital criminals and security experts who want to keep their networks safe.

But while different hackers and spammers remain in competition with one another for user eyeballs or for control of vulnerable computers, industry giants are increasingly banding together to keep their users safe, he says. Staving off hackers faster doesn’t just protect users, but reinforces the trust that keeps users coming back.

“All of us in the industry are heavily incentivized to give all of our users the best experience we possibly can,” he says.

About the author

Steven Melendez is an independent journalist living in New Orleans.