In 2017, as Facebook was roiled by an array of controversies relating to content on its platforms—from fake news, hate speech, and more—it became clear that the company believed part of the solution involved the oldest information-processing device of them all: the human eyeball. It announced that it would hire thousands of additional moderators to scan users’ posts for material that was offensive, illegal, or otherwise questionable, That was an acknowledgement that technology alone couldn’t tamp down on social networking’s bad actors. And many pundits have declared that there’s no sign AI will ever be up to the task of identifying and eliminating problematic material without human intervention.
At the second-day keynote during Facebook’s F8 conference today, the company’s CTO, Mike Schroepfer, made the case that AI is already detecting inappropriate content at scale. He also says that researchers at Facebook and elsewhere are currently making headway on techniques that will let software handle more of the work with less human help. Earlier this week, in a conference room at Facebook headquarters, he gave me a peek at the charts he planned to show at the keynote, with bars representing various types of offending items—spam, fake accounts, nudity, violence—and stats showing when Facebook began identifying them with AI, and when AI took on the majority of the process.
By sharing this data, Schroepfer hopes to belie impressions that Facebook isn’t taking the challenge of cleaning up its platform seriously enough. He is, however, quick to acknowledge that more work remains to be done and doesn’t criticize the skeptics. “The hardest thing for me personally is a sense that we don’t care,” he says. “It’s either we don’t care, or we’re not prioritizing it, or ‘It just doesn’t match my own personal experience day to day.’ But people get to feel what they feel, and until we get it right, they’re justified feeling whatever they want.”
Schroepfer is careful not to oversell AI’s promise as a universal solution to content-moderation woes. “I don’t foresee a future anytime soon where we don’t need people involved in the loop, because these are fundamentally people issues, and deciding what’s hate speech and what’s misinformation is a human endeavor,” he says. But he says that AI may be able to increasingly handle the drudgery of moderation. And as anyone who read February Verge story by Casey Newton knows, spending your days eyeballing horrifying items posted by Facebook’s worst users isn’t merely dreary; it can sear your soul.
“Some of this content is really awful to look at,” says Schroepfer. “So if I can get the worst of the worst off reviewers’ plates, then that’s just less people who are being exposed to this stuff.”
Two jobs in one
Schroepfer estimates that he currently spends roughly half of his time on technologies for dealing with current difficulties on Facebook and half on other tasks, including more fun, futuristic stuff such as Oculus VR. “Some days it’s maybe 70/30 and sometimes it’s maybe 30/70,” he says. But when it comes to research, the line is blurry. The same investigations into machine vision that Facebook conducts to let its Portal video-chat screen intelligently frame shots, for instance, can also help the company moderate videos and more rapidly delete troublesome ones. “I get to live in both worlds, which is exciting,” says Schroepfer.
Then there’s the progress the tech industry is making in teaching AI to identify things—both textual and visual—without extensive assistance. That’s an advance beyond current machine-learning techniques, which require plenty of up-front training from human beings. Such conventional training “has two really big problems,” says Schroepfer. “One, it’s pretty bespoke. If it’s not in the training set, you’re probably not going to find it. And the whole process is pretty slow. You can take weeks to build up a particularly new classifier.” The more that computers can do themselves, the better they’ll be at an array of tasks.
Schroepfer points to a Google research project called BERT. That stands for Bidirectional Encoder Representations from Transformers, which—if you’re not an AI scientist—doesn’t clarify matters in the least. But the basic idea isn’t that hard to comprehend. Instead of training an AI model with bushels of examples selected by humans, you can teach it by feeding it items that look like word puzzles. One such exercise involves sentences with words masked out, with the computer supplying the missing words as if it were a Match Game contestant. Another provides the computer with pairs of sentences, with the challenge being guessing whether the second sentence follows the first in a real piece of text.
Computer scientists are using similar techniques with video, such as giving an AI model a snippet of footage and a soundtrack, and asking it to predict whether they in fact belong to each other. Facebook scientists have also taught AI models by showing them pairs of video clips and asking them to guess whether the clips are part of one sequence. According to Schroepfer, this training method reduces the amount of manual labor by human beings from 12,000 hours to 80. “It’s a several-orders-of-magnitude reduction,” he says.
It’s easy to shovel these sorts of textual and visual puzzles into an algorithm: “You can take a whole ton of input data and mutate it,” says Schoepfer. The more puzzles the software solves, the smarter it gets at accurately handling future ones, and therefore understanding content based on contextual cues—a skill with value all over Facebook, from identifying unacceptable ads to intelligently arranging the News Feed. Such techniques fall into a class of “self-supervised” training that Facebook AI honcho Yann LeCun has called one of AI’s principal opportunities over the next decade.
The point of training algorithms more efficiently isn’t to make the process a less important part of how Facebook understands the material on its platforms. Instead, it makes it feasible to do even more training to handle additional types of content, in ways that might have been impossible when humans had to do more handholding. Schroepfer provides an example of self-supervised learning already in use at Facebook: “If we’re training a classifier to figure out whether people are talking about an election, we can train that across multiple languages at one time, so it works better in places like India.”
Ultimately, Schroepfer says, he sees a day when the technology industry’s current gloom—which goes far beyond Facebook’s issues—subsides: “I think we’re at peak pessimism right now about tech, or we’re getting close to it.” For Facebook, he adds, part of the solution will lie in the company getting better at identifying problems before they become problems, so it can deploy its technological and human resources at the earliest possible opportunity.
“The reality is, we can be better at predicting them than we have in the past, because some of them are predictable,” he says. “We can be better to reacting to new things more quickly. So even if we didn’t predict it, we have the operational muscle to deal with it.”