Earlier this year Facebook created DeepFace, a facial recognition system almost as accurate as the human brain. In a standardized test involving pairs of photographs, a human being would get the answer correct 97.53% of the time; Facebook’s technology scored an impressive 97.25%. Most people thought that was as far as facial recognition breakthroughs would go in 2014. They were wrong.
A few months after Facebook’s breakthrough, the Multimedia Laboratory at the Chinese University of Hong Kong claims to have smashed Facebook’s record by building a recognition system that achieves a massive 99.15% accuracy rate–based on some truly innovative deep learning models.
“This is strong evidence that deep learning is making artificial intelligence possible,” says the university’s Xiaoou Tang, former head of Microsoft Asia’s Visual Computing group of Microsoft Asia. “As a breakthrough, it’s very exciting for us.”
By surpassing human levels of recognition for the first time, Tang’s triumph demonstrates just how far facial recognition technology has come in recent years–and where it might be going next.
“I first became interested in facial recognition when I started to study pattern recognition in 1991,” Tang says. “It was when people were starting to consider the implications and possibilities of this technology for the first time. I was fascinated.”
After obtaining his PhD from MIT, Tang moved back to Hong Kong and established what has become China’s premier facial recognition lab, the CUHK Multimedia Laboratory. With its recent facial recognition breakthrough–named DeepID–the lab has achieved its biggest success to date.
Not only does DeepID break Facebook’s previous record in terms of accuracy, but even more impressively it has done this with just a fraction of the training data. While DeepFace required 7.4 million images for training its system, Tang’s work uses just 200,000 training images–a mere 3% of the dataset available to Facebook.
“Facebook relied heavily on their vast amount of data,” Tang says. “DeepFace uses one deep learning network to do the training. What we did was to train about 200 networks, each focusing on a different point of the face at a different scale. Then we selected the most productive group of networks from all the trained networks, and combined them in an efficient way to achieve a much better result.”
Tang’s group also used a better classifier–something called the Joint Bayesian model–to compare DeepID features extracted from different face images for face verification. The end result was a neural network (essentially a vast artificial brain) that achieved far better results–with around a 70% reduction in errors.
“When you get to the endgame it becomes much more difficult in terms of error rate reduction,” Tang says. “When you’ve got only 2% of error left to correct, a one-percentage-point improvement means a 50% error rate reduction. If you’ve only got two mistakes to correct, then managing to successfully correct just one is very significant.”
What makes Xiaoou Tang’s lab’s work of added importance–rather than it being limited to academic interest only–is that it focuses on images taken “in the wild,” rather than those taken under laboratory conditions. In other words, this is something that can (and will) have real-world applications.
There’s little doubt that facial recognition is one of several big waves currently breaking in tech. Last week, Amazon CEO Jeff Bezos announced the Fire Phone, which among other features will lock onto user’s faces to carry out a wide range of features–such as scrolling through news articles, or maneuvering through apps.
Although Amazon hasn’t yet opened up details of its new phone, its facial detection software (named Dynamic Perspective) was reportedly trained by studying a dataset of millions of images of people’s faces. “We got really good at tracking faces, finding heads,” said Bezos.
“At first glance, I thought the Dynamic Perspective and other features were just cute gimmicks and a way for Amazon to get attention in the mobile space,” says Hoyt David Morgan, the cofounder of NITO, an iOS app which uses markerless facial recognition and tracking technology for creating 3-D animated avatars. “But after careful observation, I now see a huge potential in the gaming and entertainment industry. Dynamic Perspective gives users a greater user experience by allowing users to see other aspects of an app without having to swipe or press a button. This is a major step forward for the mobile phone’s interface design.”
Much like Apple was able to redefine mobile interfaces with multitouch, so too will tools like face tracking and other interface elements that can smartly respond to the face help create the next wave of revolutionary UI design.
As something of a guinea pig, Amazon is taking a big chance with its Fire Phone. However, it’s far from the only big tech company thinking along similar lines. Apple and Google are clearly looking at facial recognition for mobile security. Last November, Apple acquired PrimeSense–the company behind the XBox Kinect. In March this year, Apple was granted a patent related to various forms of biometric passwords, meaning that users may soon have the opportunity to add extra levels of protection to secure files by safeguarding it not just with fingerprint authentication via Touch ID, but also facial confirmation. Google has long implemented a similar feature as part of Android, with a selfie password system called Face Unlock that uses facial recognition technology to safeguard your smartphone.
“We’re seeing a lot of big companies entering this space,” says Xiaoou Tang. “It’s interesting to see the approach each company takes based on its own business model. Everyone is trying to corner their particular market.”
Amazon, for instance, is interested in opening up the user interface with face tracking–although its main focus would logically be toward building on the kind of object recognition that can help identify potential products in the real world, and link users to the relevant Amazon page. A future area of interest could even be analyzing user sentiment to predict certain products at certain times.
On the other end of the spectrum, Tang points to the Chinese-market Meitu 2 smartphone–targeted at female users–which utilizes a 13MP front-facing camera, designed for taking selfies, which can then “intelligently” remove blemishes within pictures by identifying specific parts of the face. In the future the company has talked about using facial recognition for photo management.
Tang plans to get involved, too, by making his new groundbreaking facial recognition technology available for free to Android, iOS, and Windows Phone developers in the form of a FreeFace-SDK. In addition to opening up his groundbreaking work to practitioners in every field from advertising to medicine (smartphones can be great at diagnosing diseases!), he also wants to take advantage of user feedback to further improve the accuracy of his algorithm.
“The SDK will provide state-of-the-art capability for face detection, face alignment, and face recognition,” he says. “People can use it to design face-related games, face verification based login function, photo management functions, face photo labeling and search apps, and other applications. The algorithm can only improve if more people use it, which is why we want it freely available to everyone,” he says.
Facial recognition has long since split into various different disciplines–from face and eye tracking on the one hand, to facial recognition and even emotional analysis on the other. So why are all of these advances coming together so neatly here in 2014?
“Part of it is about the advancement of hardware,” says Yitzi Kempinksi, the creator of Umoove, an iOS face tracking app which launched with an impressive tech demo earlier this year. “This has always been a very processor-heavy area to work in, so as hardware has gotten better, rolling out these technologies has become more possible. Where previously you may have needed a supercomputer for this work, increasingly it is now possible on our home computers and mobile devices.”
This brings about the second major paradigm shift we’ve seen over the past few years: the prevalence of portable, built-in cameras. Even as recently as the iPhone 3–launched back in 2009–leading smartphones didn’t have front-facing cameras. A few years before that, the majority of PCs didn’t come with webcams as built-in standards. While surveillance cameras have been in full effect for decades (in the CCTV-heavy U.K., the average citizen is caught on security camera 300 times per day), it is only as mass market cameras became a ubiquitous part of our devices that the consumer sector has really opened up.
“We take a lot more pictures now as a culture,” Kempinksi continues. “From being just one thing that we share online, pictures have became something that drives online sharing–which you can see through the rise of services like Snapchat. Increasingly the face is the main data point. We’ve known how to analyze text for years; analyzing faces is something we’re just starting to understand.”
There are, of course, a number of ethical questions that will need to be worked through as this trend continues. Phones that can detect our faces–or even pick up on registered emotions and make intelligent decisions based on this data–have the opportunity to raise a number of concerns around topics like privacy.
“What information is being provided about people that are identified?” says Kelly Gates, professor in communication and science studies at UC San Diego, and author of Our Biometric Future: Facial Recognition Technology and the Culture of Surveillance. “Their name, phone number, home address, employment history, police records, Match.com preferences, Amazon purchases? Will it provide a range of possible facial matches, or just one? There’s a lot of questions we want to ask about the design of these kinds of apps up front, before they become commercially available. They disperse identification and tracking capabilities into infinitely more spaces than current systems.”
After years of false starts, the implementation of real-world facial recognition systems are finally here–and in some cases passing human levels of recognition in the process.
It’s where we go from here that really matters.