Facebook’s AI for detecting hate speech is facing its biggest challenge yet

Advancements in AI have dramatically improved the company’s ability to identify written hate speech. But when it comes to rooting out hateful images, videos, and memes, Facebook’s AI has a long way to go.

Facebook’s AI for detecting hate speech is facing its biggest challenge yet
[Source photos: /Unsplash; Alekzan Powell/Unsplash; Luke Braswell/Unsplash; Michael Dam/Unsplash; Aiony Haust/Unsplash; Jessica Felicio/Unsplash; Erik Lucatero/Unsplash; Glen Hodson/Unsplash]

The single most amazing thing about Facebook is how vast it is. But while more than two and a half billion people find value in the service, this scale is also Facebook’s biggest downfall. Controlling what happens in that vast digital space is nearly impossible, especially for a company that historically hasn’t been very responsible about managing the possible harms implicit in its technology. Only in 2017—13 years into its history—did Facebook seriously begin facing up to the fact that its platform could be used to deliver toxic speech, propaganda, and misinformation directly to the brains of millions of people.


Various flavors of toxic stuff can be found all over Facebook, from bullying and child trafficking to the rumors, hate, and fakery that helped Donald Trump become president in 2016. In the past few years, Facebook has invested heavily in measures to control this kind of toxic content. It has mainly outsourced its content moderation to a small army of reviewers in contract shops around the world. But content moderators can’t begin to weed through all the harmful content, and the traffickers of such stuff are constantly evolving new ways of evading them.

That’s why Facebook is betting on AI that can detect hate and misinformation on its platform. This AI resides in hundreds of servers in the company’s data centers. Complex neural networks trained to recognize toxic user content are called upon whenever a new post appears anywhere on Facebook, and asked to determine if the content violates any of the company’s community guidelines. Some look for hate speech, some look for misinformation, others look for bullying or nudity, and so on. While much of the inappropriate content is sent to human moderators for further action, some of it can be analyzed and then removed by AI alone.

The company has made significant progress: In the second quarter of this year, Facebook reports that it took down 104.6 million pieces of content (excluding spam) that violated its community standards. It removed 22.5 million pieces of hate speech alone from Facebook in the second quarter, compared to 9.6 million in the first quarter, and compared to just 2.5 million hate posts two years ago.


When announcing these improvements to press earlier this week, VP of integrity Guy Rosen credited them to the company’s AI detection tools. Thanks to some major advances in natural language processing over the past few years, these algorithms are better equipped to detect toxic speech in written form than ever before.

“We’re getting to the point where most of our systems are probably close to, as good [as], or possibly better than an untrained person in that domain,” says Mike Schroepfer, Facebook’s CTO. “My goal is to get us to as good as experts . . . We’ll get there as fast as we can.”

But the future of Facebook is images and multimedia. Hateful and dangerous messages may lie in the midst of videos or encoded in memes. And so far, the breakthroughs the company has seen in its natural language AI have not transferred over to similar progress in its computer vision AI’s ability to detect such content. I spoke to Schroepfer and two of Facebook’s top AI technologists about their recent successes and about the hard computer vision challenges that lie ahead if Facebook ever hopes to finally address the colossal amount of toxic content on its platform.


Neural networks at play

Facebook’s recent success with detecting hate speech over the past two years stems from some dramatic inroads made by the AI research community.

Most artificial intelligence models in the past have been trained using variants of “supervised” learning. Using this approach, computer scientists feed a neural network samples of the content it’ll soon be asked to analyze and classify, such as images, text, or audio. The scientists also affix labels that describe the contents of each sample. The neural net then processes the training data and gradually adjusts the mathematical weights in each of its layers until it can arrive at the descriptions contained in the labels on its own.

Supervised learning usually involves relatively small sets of labeled training data, and the data contains fairly specific examples of items the model might encounter while doing its intended task. The downside of that specificity is that the model can be thrown off by content it encounters that wasn’t in the training data. For example, a natural language model might fail to flag a racial slur it hasn’t seen in training, and fail to understand it from its context.


[Source images: Miguel Á. Padriñán/Pexels; Oleg Magni/Pexels]
These models might learn specific tasks better, the thinking has gone, if they were “pretrained” with some base-level understanding of the world, giving them something like a human being’s common sense. That’s what unsupervised, or self-supervised, learning in AI is all about. It’s a training approach in which an AI system is fed copious amounts of unlabeled training data—such as whole books, hours of video, or big piles of images. Part of the training data is hidden (a few words in a sentence, or one portion of an image), and the system learns by gradually tuning its parameters to guess the masked content with greater and greater probability, based on its understanding of the adjacent words or image elements. This technique is called self-supervised learning because the system is using the hidden parts of the training data—not human-applied labels—as a signal to guide the adjustment of its parameters.

After training this way, sometimes for days or even weeks, the system begins to “represent” words or images inside a large, multidimensional structure where similar things are situated closer together and dissimilar things are located farther apart. In doing so, the system begins to learn how different things in the world behave and how they relate to each other.

It’s the same approach humans use when we’re babies. We crawl around and sense things, and we gradually gain an understanding of the world. Then, when it’s time to learn how to pour grape juice into a cup, we don’t need to be taught that the juice will fall downward from the pitcher and not just hang in the air.


The system has to represent the meaning of the words that it sees, the structure of the sentence, the context.”

Yann LeCun

One area of AI, natural language processing, has found particular success using self-supervised learning. In 2018, Google researchers created a natural language model called BERT (Bidirectional Encoder Representations from Transformers), which they trained with massive amounts of text (11,038 books and 2.5 billion words from English-language Wikipedia entries), with parts of the text hidden. During training, the system gradually tuned itself to accurately fill in the blanks with higher probability. In doing so, it gradually structured the data into something like a huge word cloud, where words with similar meanings and contexts are situated closer together, and words that share little common meaning or context (like “fish” and “aerospace”) are situated farther apart.

In 2019, Facebook built upon BERT’s approach with its RoBERTa model, which uses even more training data. “The system has to represent the meaning of the words that it sees, the structure of the sentence, the context,” says Facebook chief AI scientist Yann LeCun, a legend in the AI field who’s been tackling some of its biggest challenges since the 1980s. “As a result, it kind of learns what language is all about, which is weird, because it doesn’t know anything about the physical reality of the world. It doesn’t have vision, it doesn’t have hearing, it doesn’t have anything.” But even if a computer doesn’t learn about the world directly, self-supervised learning can help it learn something about the meanings of words, and the relationships between meanings, while processing the data.


Thanks to this new approach, Facebook’s natural language models blew past existing records in 2019, including several industry standard benchmark tests. (Natural language systems from Microsoft, Google, and OpenAI also showed big performance improvements that same year.) Those natural language models have now been put to work in Facebook’s data centers, resulting in huge leaps in the proportion of harmful speech that the company proactively detects and deletes before users see it. In its most recent Community Standards Enforcement Report released this week, Facebook said its AI systems were immediately detecting 95% of hate posts, up from the 88% reported in April, and up from the 52% the company reported last summer.

Putting research into practice

Since ill-intentioned Facebook users are constantly trying to devise new forms of toxic speech to make it past the Facebook censors, it’s necessary to continually train the AI models with newly captured content. And new subjects—such as the pandemic—spawn new types of harmful content that Facebook’s models must learn to detect. “We didn’t have anything about COVID in December of last year,” explains CTO Schroepfer. “If people start posting COVID misinformation, that may change day to day. Or people might adapt hate speech and use coded language or a terrible new meme.”

Some of the training data comes from Facebook’s content moderators, who capture and label new kinds of toxic or misleading speech they’re seeing on the social network. The human reviewers can also use the AI tools to search and destroy known types of harmful content on the network. “Our expert human reviewers can say, ‘Ah, there’s a new coded language being used here—find me all the posts in the system that look a lot like this post we just found that clearly violates our standards and get rid of them,'” Schroepfer says. “So you have this ability to augment our people with power tools.”


You have this ability to augment our people with power tools.”

Mike Schroepfer

Some of the training data provided by the reviewers are examples of false positives, where Facebook mistakenly took down a piece of content that didn’t violate community guidelines. The models learn from that too.

Facebook is also willing and able to throw large hardware investments at the problem. “With a bigger data set and a bigger, more powerful network, I can do a much better job of catching all the nuances and subtleties of these things,” says Schroepfer.

Facebook CTO Mike Schroepfer [Photo: Facebook]
Facebook’s AI is far from perfect, as Schroepfer acknowledges. According to leaked July 2019 documents obtained by NBC News, the company’s systems flagged and proactively removed a higher proportion of hate speech posts targeting white people than was reported by users. The same AI systems, NBC reports, took down a lower proportion of hate speech targeting marginalized groups including Black, Jewish, and transgender users than was reported by users, “indicating that these attacks were considered to be offensive but Facebook’s automated tools weren’t detecting them.”


At that time, according to Facebook records, users were still reporting almost half (48%) of all hate speech found on Facebook. The NBC report quotes current and former Facebook employees saying that when confronted with this data, Facebook’s executive team ordered Facebook employees to stop that line of inquiry and desist from sharing it with others in the company. When I asked about this report, Facebook chose not to comment.

The leaked document in the NBC story reflects the performance of older AI detection tools that hadn’t yet been pretrained using RoBERTa. Still, concern about biases in Facebook’s content moderation algorithms has continued in academic and civil rights communities. Facebook recently completed an independent civil rights audit of its practices, performed by civil rights attorneys Laura Murphy, Megan Cacace, and a supporting team at the civil rights law firm Relman Colfax. Murphy writes in the team’s report: “. . . civil rights advocates contend that not only do Facebook’s policies not go far enough in capturing hateful and harmful content, they also assert that Facebook unevenly enforces or fails to enforce its own policies against prohibited content. Thus harmful content is left on the platform for too long.” The auditors write that these criticisms are especially acute with regard to content targeting African Americans, Jews, and Muslims.

The image problem

While Facebook’s algorithmic content moderation has dramatically improved since 2019, more challenges lie ahead. RoBERTa was designed to pretrain natural language AI tools, which scan only the textual content on Facebook. RoBERTa’s self-supervised learning approach hasn’t worked nearly so well in the pretraining of computer vision AI models used to detect toxic images.


Researchers at Facebook and Google are currently working to pretrain image classifiers using an approach called “contrastive learning,” which LeCun was instrumental in developing in the aughts. This involves two neural networks (“Siamese networks”) working together to decide if millions of pairs of images are similar or different. A similar pair might be two different pictures of the same person, or two images of the same object, but one image is rotated or distorted. A dissimilar pair might be a picture of a pig and a picture of a shoe.

In the same way that BERT and RoBERTa structure words in a three-dimensional space reminiscent of a word cloud, the two neural networks work together to arrange image attributes (such as object shape and background color) in a theoretical space, where similar images are closer together and dissimilar images are farther apart. LeCun says the research has yielded image classifiers accurate enough to exceed current image-recognition benchmarks, but he still doubts that the approach will ultimately produce image classifiers that are good at detecting harmful imagery, and efficient enough to run at a large scale. That’s because the theoretical space the networks must create to accommodate numeric representations of every possible aspect of every pixel of an image is just too large.

[Images: suwanneeredhead/iStock; StudioM1/iStock]
With text content, BERT and RoBERTa can represent words as values within a three-dimensional theoretical space. The models did this word by word for the roughly 30,000 English words in their training data.


But visual data is different, and more complex. An image is expressed pixel by pixel in a grid, and each pixel has a number of values associated with it. Each pixel can have side-to-side, up-and-down, and backward-and-forward coordinates, depending on its location in the image. Each pixel also has a red, green, and blue color value. Video adds an additional dimension because the pixels change as they go backward and forward in time. Since the number of possible combinations of these attributes is almost endless, the theoretical space in which they’re contained has millions of dimensions. As the number of training images get larger, the theoretical space needed to map them grows immensely large.

Mapping all these dimensions in a theoretical space, and then predicting the values for the pixels of training images’ hidden portions, is a very heavy lift and requires massive computing power, even for small images, LeCun explains.

‘The ticket to the future’

Despite the difficultly of applying recent AI breakthroughs to visual material, Facebook has a good reason to hope LeCun can build a computer vision system that does work. Facebook user content is more visual than ever, and it continues to move in that direction. Hate and disinformation on the platform is also increasingly visual. Some of the most toxic or harmful user content comes in the form of images and videos (such as deepfakes).


Facebook chief AI scientist Yann LeCun [Photo: Jérémy Barande / Ecole polytechnique Université Paris-Saclay / CC BY-SA 2.0]
LeCun is now working on a different, and hopefully more efficient, approach to pretraining image classifiers. Like natural language processing models, computer vision models do far better at specific recognition tasks if they’re first pretrained with massive piles of unlabeled image data, then allowed to structure that data in a way that helps them form a basic knowledge of how the visual world works.

LeCun’s current research focuses on ways to reduce the number of image attributes the AI must focus on to just the really important ones—ones representing the subject of an image or video, for example, and not the background. He explains that the vast majority of images that can be numerically expressed within the huge theoretical space are just random combinations of attributes that don’t depict anything found in the natural world. Only a small sliver of all the possibilities represent things the model might encounter in images posted on Facebook.

LeCun is looking for ways to train the model to infer the meaningful content within a photo or video, then focus only on the area in the theoretical vector space needed to express that image. For instance, a model might infer from a piece of video that an object—say, a human face—could be represented pixel by pixel using only a small subset of all the possible positions and color states of all pixels within the frame. It might zero in on just 50 pixels that, in their various states, can represent all possible movements of all the surfaces of the face. With a smaller group of variables to work with, the model may have a much better chance of predicting which way a face in a video might move (in the same way a natural language model can predict the blanks in a sentence).


[Source images: Godruma/iStock;Justin Shaifer/Pexels]
But it’s teaching the model how to make those inferences that presents the biggest barrier to effective pretraining of computer vision systems.

LeCun is betting that it’s a surmountable problem. It could be the final major research push of his storied, decades-long career.

“In my opinion, this is the ticket to the future,” he tells me. “That’s what I’m working on, and whether I’m right or not I can’t tell you, because I don’t have the results to prove it.”

The challenges of a multimodal platform

As visual-first content increases in popularity, the most-shared content on Facebook tends to be mixed media—comprising more than one mode of communication, such as items combining text with a photo or video. “The future is multimodal,” says Facebook’s VP of AI, Jerome Pesenti. “The mixed-media kind of content is becoming the majority of the content on our platform.”

The mixed-media kind of content is becoming the majority of the content on our platform.”

Jerome Pesenti

For example, memes, which combine language and imagery, played a big role in the organized disinformation campaigns by both foreign and domestic actors who sought to influence the 2016 presidential election. Many of them were carefully designed to divide and polarize voters, to openly promote Donald Trump, or to convince people not to vote. The memes were often widely shared because they were edgy and controversial, or funny, or both. It’s likely that similar images will play an important role in influence campaigns on Facebook in the weeks before the 2020 election.

But memes’ multimodal nature–the integration of language and imagery–makes them a difficult challenge for Facebook’s AI.

They’re often highly contextual, referential, ironic, and nuanced. They can also be cryptic and encoded so that only members of some specific enclave of internet users can parse their meaning—as the manifesto posted on 8chan by the Christchurch killer exemplified.

Memes often require a bit of parsing and thinking to get at their meaning, even for reasonably savvy netizens, CTO Schroepfer points out. “That’s a good indication that it’s a much harder problem for machines,” he says. “We do have systems that are able to work in this multimodal way . . . it’s an area where we’re pushing forward, and I think we’ll make reasonable progress over the next few years.”

[Images: via Facebook]
Memes are such a tough problem for AI because the machine must learn from the content of the image and the content of the text at the same time, not separately. “We have a huge amount of examples where the image by itself is not offensive and the text by itself is not offensive—it’s the combination of the two that’s offensive,” Pesenti tells me. “So it’s really important to have a system that learns from both modalities at the same time.”

Facebook has already deployed AI trained using supervised learning to detect toxic multimodal content, but it doesn’t yet know how to efficiently pretrain those models to help them decipher the complexity and nuance of memes. The company says it’s been using AI systems that analyze both images and text to detect harmful multimodal content on Instagram, but said in May that the proactive detection rate of those systems improved from 43.1% in Q4 2019 to only 44.3% in Q1 2020.

“It’s a huge area of research for us,” Pesenti says. “We are also really trying to create a research domain on this and interest other people in this problem.”

Pesenti is referring to the Facebook Hateful Memes Challenge, in which Facebook provides a sample data set of hateful memes to developers and challenges them to build an algorithm that accurately detects hateful memes at scale. The developer with the best multimodal AI gets a cash prize, and Facebook may get some new insight into how to approach the problem. The company took a similar approach to catalyzing the development of systems that can detect deepfake videos with its Deepfake Challenge in 2019.

How big a problem?

Users are entirely dependent on Facebook to self-report the stats showing how the company’s war on toxic harmful content is going. While the company is forthcoming with details about its progress, it’s less transparent about the real extent of the problem that remains.

The company reports the percentage of the content its AI systems detect versus the percentage reported by users. But those two numbers don’t add up to the whole universe of harmful content on the network. It represents only the toxic content Facebook sees.

For the rest, Facebook intends to estimate the “prevalence” of undetected toxic content, meaning the number of times its users are likely seeing it in their feeds. The estimate is derived by sampling content views on Facebook and Instagram, measuring the incidences of toxic content within those views, then extrapolating that number to the entire Facebook community. But Facebook has yet to produce prevalence numbers for hate posts and several other categories of harmful content.

Nor does Facebook actively report how many hours toxic posts missed by the AI stayed visible to users, or how many times they were shared, before their eventual removal. In addition, the company does not offer similar estimates for misinformation posts.

Where misinformation is concerned, examples of content that isn’t automatically caught by Facebook’s algorithms aren’t hard to find. In late July, a Breitbart-produced video featured people in lab coats who called themselves “America’s Frontline Doctors” claiming that hydroxychloroquine is “a cure for COVID” and that “you don’t need a mask.” The video got 20 million views on Facebook before being taken down, NBC News’s Brandy Zadrozny reported.

On August 5, Facebook removed a post from Donald Trump’s official page containing a video of him falsely saying that children were “almost immune” to the coronavirus. The video, which violates Facebook’s special coronavirus misinformation rules, was up for about four hours, according to CrowdTangle, during which time it was viewed almost a half million times.

Note that the harmful content in both cases was delivered via video, not text. Facebook’s AI systems were apparently unable to immediately detect those statements and flag the posts for removal. It’s not hard to imagine hate and other types of harmful content being delivered within videos and memes, too. A more advanced AI model, perhaps one pretrained using self-supervised learning and equipped to analyze images and text at the same time, might have been able to identify them. Facebook declined to comment on the Breitbart or Trump videos.

Many of Facebook’s content moderation challenges are immediate and daunting, such as the problems of proactively detecting misinformation, propaganda, and hate. While Facebook benefited greatly from recent advancements in natural language AI when it began building off Google’s BERT, it has limited control over how quickly its own researchers and the wider AI community will push advancements in AI that can detect toxic content in the visual realm.

For all the progress it’s made, Facebook’s content moderation AI is just beginning to impact the company’s effectiveness at detecting violating content in each of the domains covered by its community standards, such as hate, terrorism, and misinformation.

And given Facebook’s scale and the speed at which some use it to spread hate, incite violence, and share lies with millions, Facebook will have to keep running to catch up.


About the author

Fast Company Senior Writer Mark Sullivan covers emerging technology, politics, artificial intelligence, large tech companies, and misinformation. An award-winning San Francisco-based journalist, Sullivan's work has appeared in Wired, Al Jazeera, CNN, ABC News, CNET, and many others.