Last March, when we all started wearing masks, phone makers suddenly had a big problem. The facial recognition systems used to authenticate users on their phones no longer worked. The AI models that powered them couldn’t recognize users’ faces because they’d been trained using images of only unmasked faces. The unique identifiers they’d been trained to look for were suddenly hidden.
Phone makers needed to expand their training data to include a wide assortment of images of masked faces, and quickly. But scraping such images from the web comes with privacy issues, and capturing and labeling high numbers of images is cost- and labor-intensive.
Enter Synthesis AI, which has made a business of producing synthetic images of nonexistent people to train AI models.
The San Francisco-based startup needed only a couple of weeks to develop a large set of masked faces, with variations in the type and position of the mask on the face. It then delivered them to its phone-maker clients—which the company says include three of the five largest handset makers in the world—via an application programming interface (API). With the new images, the AI models could be trained to rely more on facial features outside the borders of the mask when recognizing users’ faces.
Phone makers aren’t the only ones facing training data challenges. Developing computer-vision AI models requires a large number of images with attached labels that describe what the image is so that the machine can learn what it is looking at. But sourcing or building huge sets of these labeled images in an ethical way is difficult. For example, controversial startup Clearview AI, which works with law enforcement around the country, claims to have scraped billions of images from social networking sites without consent.
That’s where synthetic data, which is data generated by computers rather than recorded from real life, can play a role. But even though it’s generated, this data’s goal is to provide the same characteristics that a real data set would have. Synthetic data have been used by self-driving car makers, and researchers have looked into its use in healthcare.
Synthesis AI starts building fake faces by making a 3D model. From there, it can generate new faces based on combinations of a high number of variables, such as gender, age, ethnicity, hair style and color, facial hair, accessories (glasses), face angle, lighting, and more. “We started forming Synthesis AI by bringing digital effects and CGI from the gaming and movie worlds together with AI models,” says Synthesis AI CEO Yashar Behzadi.
The startup, which was founded in 2019 and recently raised $4.5 million in seed funding, serves these images to clients from the cloud with its FaceAPI. The images are completely fictitious and accompanied by the description labels needed to train AI models. Client companies are charged by the image, Behzadi says.
Behzadi says a lack of diverse training data often leads to AI models performing inconsistently across user groups, including ethnic groups. Such bias is a widespread problem: Studies in 2018 and 2019 have shown that many machine learning systems misidentify the faces of people of color far more often than they do white faces. To combat this, Behzadi says companies can access the Synthesis AI API and order up the images they need to balance their training data set. (It should be noted that representative training data is not a silver bullet for bias problems, since bias can be inadvertently built into the various computational layers of a neural network itself.)
Today, Synthesis AI’s biggest customers come from the smartphone industry, Behzadi tells me, but the company has also found customers among teleconferencing software makers, the developers of smart assistants (such as Alexa), and emotion-detection AI companies such as Affectiva.
Large AI developers, such as Facebook and Waymo, now have their own departments for creating synthetic training imagery. But Behzadi believes there is a large and growing market for synthetic training data among smaller companies, or companies that don’t see AI development as their main business.