This article is part of Fast Company’s editorial series The New Rules of AI. More than 60 years into the era of artificial intelligence, the world’s largest technology companies are just beginning to crack open what’s possible with AI—and grapple with how it might change our future. Click here to read all the stories in the series.
Amazon’s controversial Rekognition computer vision technology is now being used to rid food sites of surprise dick pics.
Well, in one case anyway. The London-based food delivery service Deliveroo has a definite content moderation challenge. It seems that when there’s a problem with a food order, Deliveroo customers often send in a photo of the food with their complaint. And they often photobomb the food using their genitals. Or they arrange the food into the shapes of private parts. Really.
Deliveroo’s employees, it turns out, don’t necessarily want to see stuff like that. So the company uses Rekognition to recognize unsavory photos and blur or delete them before they reach human eyes.
Deliveroo’s issue represents the slightly bizarre edge of an increasingly serious problem. In one way or another, many internet companies are about user-generated content. In recent years, we’ve increasingly seen the dark side of human nature show up in it. Content moderation has become a priority as sites increasingly play host to unsavory material like fake news, violent content, deepfakes, bullying, hate speech, and other toxic user-generated content. If you’re a Facebook, you can develop your own AI or hire an army of content moderators—or both—to deal with this mess. But smaller outfits with fewer resources don’t have that option. That’s where Amazon’s content-moderation service comes in.
The service is part of Amazon Web Services’ Rekognition computer vision service, which has itself been the subject of a lot of bad press relating to Amazon’s apparent willingness to provide facial recognition services to the U.S. Immigration and Customs Enforcement. You can find other examples of surveillance-oriented applications on tap at the Rekognition website, like its ability to read license plates at all angles within video, or track the physical paths taken by people caught on camera.
Perhaps seeking some more positive exposure for its computer vision services, Amazon has begun talking for the first time about the use of Rekognition to police user-generated content for lewd or violent imagery. The Rekognition content moderation service detects unsafe or offensive content within images and videos uploaded to company websites.
And it’s a growth business. “The role of user-generated content is exploding year-over-year as we now share two or three pictures with our friends and family every day on social media,” Amazon’s VP of AI Swami Sivasubramanian tells me. Sivasubramanian says Amazon began offering the content-moderation service at the request of a number of customers back in 2017.
Companies can pay for Rekognition instead of having to hire humans to inspect the images uploaded by users. Like other AWS services, the content moderation service has a pay-as-you-go model and is priced based on how many images are processed by Amazon’s Rekognition neural networks.
Not surprisingly, among the first users of the content management service are dating and matchmaking sites, which are challenged to quickly approve selfies uploaded to user profiles. Amazon says the matchmaking sites Coffee Meets Bagel and Shaadi are using it for that purpose, as is the Portugese site Soul, which helps others create dating sites.
The AI isn’t just looking for nudity. Rekognition’s neural networks have been trained to detect all kinds of questionable content, including images of firearms or violence or generally disturbing content. This is the menu of classifications from the Rekognition website:
How it works
Like everything else that’s part of AWS, Rekognition’s new content moderation features run in the cloud. A company can tell the service what types of problematic images it wants to detect. Then it feeds its user-generated photos and videos—which, in many cases, may be stored on AWS in the first place—to the service.
The Amazon deep neural networks process the images to discover their content and flag any of the potentially objectionable image types. The neural networks output metadata about the contents of the images, along with a percentage score representing its confidence in the image labels it has attached. It looks like this:
That code goes to a piece of software on the customer side that determines—based on business rules already programmed in—how to deal with the flagged images. The software might automatically delete a given image, allow it, or blur out part of it, or send it on to a human being for review.
The deep neural nets that do the image processing have many layers. Each one assesses data points representing various aspects of an image, runs calculations on them, and then sends the result on to another layer of the network. The network first processes top-level information like the basic shapes in the image and whether a person is present.
“Then it just continues to refine more and more, the next layer gets more and more specific and so forth,” explains Sivasubramanian. Gradually, layer by layer, the neural network identifies, with increasing certainty, the content of the images.
AWS VP of AI Matt Wood says his team trains its computer vision models with millions of proprietary and publicly available image sets. He says that Amazon doesn’t use any customer images for this training.
Frame by frame
Some of the biggest Rekognition content management customers aren’t using the service to moderate user-generated content. Amazon says media companies with large libraries of digital video need to know the contents of every frame of video. Rekognition’s neural network can process every second of the video, describe it using metadata, and flag potentially harmful imagery.
“One of the things machine learning is really good at is looking inside the video or the images and providing additional context,” Wood tells me. “It might say, ‘this is a video of a woman walking by a lake with a dog,’ or ‘this video has a man who is partially dressed.'” In such use, the neural network is able to detect dangerous or toxic or lewd content in images with a high level of accuracy, he says.
Still, this branch of computer vision science hasn’t hit maturity yet. Scientists are still discovering new ways of optimizing the algorithms in the neural networks to identify images with more accuracy and in more detail. “We’re not at a place of diminishing returns yet,” Wood says.
Sivasubramanian told me that just last month the computer vision team reduced false positives (where images were mistakenly flagged as potentially unsafe or offensive) by up to 68% and false negatives by up to 36%. “We can still improve the accuracy of these APIs,” he said.
Beyond accuracy, customers have been asking for finer detail on image classifications. According to the AWS website, the AWS content moderation service returns only a main category and a secondary category for unsafe images. So the system might categorize an image as containing nudity in the primary category, and as containing sexual activity in the secondary one. A third category might include classifications addressing the type of sexual activity shown.
“Right now the machine is very factual and literal—it will tell you ‘this is what is there,'” says Pietro Perona, a computation and neural systems professor at Caltech, and an AWS advisor. “But scientists would like to be able to go beyond that to not only say what is there but what are these people thinking, what is happening. Ultimately that’s where the field wants to go, not just listing what’s in the image like a shopping list.”
And these nuanced distinctions could be important to content moderation. Whether or not an image contains potentially offensive content could depend on the intent of the people in the images.
Even the definition of “unsafe” or “offensive” is a moving target. It can change with time, and differs between geographical regions. And context is everything, Perona explains. Violent imagery provides a good example.
“Violence may be unacceptable in one context, like actual real-life violence in Syria,” Perona says, “but acceptable in another, like within a football match or in a Quentin Tarantino movie.”
As with many AWS services, Amazon isn’t just selling Rekognition’s content-moderation tool to others: It’s also its own customer. The company says that it’s using the service to police the user-generated images and videos posted with user reviews in its marketplace.