Facebook is leveraging billions of Instagram photos and thousands of user-added hashtags to improve the state of the art of image recognition.
Today at F8, its annual developer conference, Facebook announced it has pioneered a new form of artificial intelligence research demonstrating that training image recognition networks on huge collections of public Instagram photographs produced better image recognition results than traditional, manual annotation of images.
The company says this approach, known as weakly supervised pre-training, achieved an accuracy rate on a model trained on a billion images of 85.4%, the highest-ever score, on ImageNet1K, a typical image-recognition benchmarking tool. On a larger set of 3.5 billion images, but a smaller model, the system achieved an accuracy score of 84.1%. The work beat the previous state-of-the-art model, which utilized supervised, hand-labeled data, by 2%.
The effort is vital to Facebook as it aims to use AI for things like automatically identifying objectionable content or generating much more specific audio captions–like “sparrow” instead of “bird,” or “Indian cuisine” instead of “food”– for photos for visually impaired users.
The AI industry has long leaned on hand-labeling data and training models on tens of millions of images. But increased accuracy relies on scaling those models to billions of images, something that’s essentially not possible if all the labeling is done manually.
Facebook realized it already had a massive public repository of labeled images on its hands: Instagram, which hosts countless billions of photographs, many of which have hashtags, and often multiple tags, which increases the amount of useful information.
“Since people on Instagram often caption their photos with hashtags for virtually every imaginable thing in the visual world,” the company wrote in a blog post about the work, “we believed they’d be an ideal source of training data for models. It also allowed us to use hashtags as they were always intended: to make images more accessible, based on what people assume others will find relevant.”
But many hashtags are vague or non-specific, which Facebook treated as “incoherent label noise that can confuse deep learning models.” So the company came up with new approaches to handling that noise, including leaning on the fact that many users add multiple hashtags–and thus more context–to the images they post.
Nodding to concerns related to Facebook’s recent privacy and security controversies, Manohar Paluri, who leads Facebook’s Applied computer vision team, said the researchers were careful to only work with public Instagram data, and are not publishing the data set. As well, Paluri says, the team ensured that if users ever deleted photos, the information they included would be deleted from the model-training process.
Facebook also says it plans on open-sourcing some of its work in this area so that the research community at large can take advantage of it for other high-level tasks.