A little over a year ago, an anonymous Reddit user named Deepfakes changed the internet.
In early 2018, they uploaded a machine learning model that could swap one person’s face for another face in any video. Within weeks, low-fi celebrity-swapped porn ran rampant across the web. Reddit soon banned Deepfakes, but the technology had already taken root across the web–and sometimes the quality was more convincing. Everyday people showed that they could do a better job adding Princess Leia’s face to The Force Awakens than the Hollywood special effects studio Industrial Light and Magic did. Deepfakes had suddenly made it possible for anyone to master complex machine learning; you just needed the time to collect enough photographs of a person to train the model. You dragged these images into a folder, and the tool handled the convincing forgery from there. The anonymous user had sparked “a war on what’s real,” as one special effects legend described it to me last year.
Now, roughly 12 months later, Deepfakes is proving prescient. A new wave of companies is looking to cash in on similar technology, leveraging machine learning to do unprecedented things in media–from faking voices, to faking avatars, to faking highly detailed photographs. I spoke with people at three of these companies, each of which is working to develop commercial applications. In addition to figuring out a sustainable business model for their software, each of them must reckon with the power of this still-emerging tech and how to protect society from their own tools, rather than subvert it.
For more than a decade, son-father team Eric and Albert Yang have run a small but successful software firm called Topaz Labs, which builds various stand-alone image-editing tools along with plug-ins for software like Adobe’s After Effects. They spent years developing their algorithms, hiring image enhancement PhD researchers to do laborious, highly specialized work to craft their software–which can pull exceptionally sharp images from video, or add an electrical glow to a moving image.
Then the machine learning revolution happened. Suddenly, they could train neural nets to do the hard work of sharpening images: “One of our biggest surprises . . . was seeing we could just throw away our 10 years of hard work instantly.” So, over the last year, the studio has more or less rebuilt itself around machine learning. “It’s almost like a new startup,” says Eric.
The company’s new AI-powered software suite, which allows everything from noise reduction to converting a simple JPEG to a highly editable RAW image, has been a hit, projected to drive 50% revenue growth this year. Its pièce de résistance is Gigapixel AI, a tool that is essentially the “ENHANCE!” meme in real life: It lets you take a very low-resolution image and make it 600% bigger. Each pixel inside the original image is analyzed and optimized by 2 million separate processes, allowing an iPhone photo to become an 80-inch print, thanks to AI that has been trained on tens of thousands of images to understand how, in most photos, individual pixels generally look around one another. It works so well that one of Topaz Labs’ biggest challenges is convincing customers that the examples on its site are real.
Gigapixel AI is really good at enhancing landscapes and scenery–so it’s actually pretty bad at adding detail to people in photos, because it doesn’t know what a face should look like. Still, the software could be abused in other ways if it can create convincing fake images–something Topaz knows well. Some police authorities have implemented one of its older, non-AI pieces of software to enhance license plates to better identify (and ticket) them. Since the company’s AI software actually generates new pixels, they admit it’s theoretically possible for the AI to guess the “wrong” pixels–perhaps to the point that a license plate could have an incorrect digit or letter, if it were ever used this way.
“Machine learning as you know doesn’t know real detail. It recreates detail. Obviously that is problematic,” says Albert. “As the technology improves we have to think about it more. The data right now is not at a level where we’re too concerned someone will be misidentified from a wrong license plate because software fabricated the number.” But it could.
So for now, Topaz is trying to control the use of its software by marketing its technology to photographers, and advising anyone else, like authorities, about its limitations. “We’re not getting into the security enforcement area at this moment,” adds Albert.
Mike Pappas was walking down the hall in his dorm at MIT when he saw a person in his room working on a whiteboard. It was Carter Huffman, a fellow physics undergrad, and his future CTO of Modulate. Founded in 2018, with $2 million in funding from audio innovators like the founders of Harmonix (heard of Guitar Hero?), Modulate does something akin to Deepfakes, except with speech: It lets you turn your voice into anyone else’s. On its site, Modulate features a convincing, if a bit garbled, demo of Barack Obama himself pitching the software.
The idea for Modulate was born around 2015. Huffman had come across this new phenomenon called style transfer–which could apply the style of one piece of art to another, making a photo look like a Van Gogh painting. In terms of training machines to generate convincing forgeries, you could think of style transfer as a cousin to Deepfakes. “Carter was reading up and had the idea that you can save audio as an image, a spectrogram, and wondered what if we tried to do image style transfer on this audio,” recounts Pappas. “He set up experiments, and the immediate answer was it sounds like complete garbage.”
But three years of tinkering later, Modulate has gotten pretty good. The software works by training a model on many, many samples of someone’s voice. That means that public figures, who’ve recorded hours upon hours of audio, make easier targets for impersonation. In theory, you could use the technology inside Modulate to build a model of politicians, celebrities, or anyone who spends a lot of time talking–and then use the computer to speak in their voice whenever you want.
But Pappas is not interested in Modulate being used for imitating politicians or celebrities. “The Obama voice is on our website because we thought it as important to have a demonstration on how we could match a person or character, and he happens to have a lot of public audio available, so it was easy,” says Pappas. “There are some people who’d say it’d be cool to sound like Barack Obama–for maybe a minute or two.” So the Obama filter isn’t actually available.
Instead, Modulate wants to license its tech to social media and gaming companies, allowing you to have a cool audio avatar on these platforms but nowhere else outside them. The voice wouldn’t be trained by you to sound like you, but by the developers to sound like a character, so it would be under relatively strict control. Why sound like a prepubescent tween when you can speak with one of the actual voice actor’s voices in a game like Overwatch?
“The most meaningful immediate application is for consumers going online, designing their online personas. They’re spending money on new skins for their characters, but as soon as they use voice chat . . . they’re breaking the illusion they crafted because they have to use their own voice,” says Pappas. “Give them the freedom to step inside their character entirely.”
That said, Pappas doesn’t deny that Modulate could incorporate celebrity voices. He points out that Fortnite recently had NFL visual skins for purchase, and maybe some players would want to lend their own voices to the game, too. But Modulate has considered how to crack down on fraud early in this regard. It incorporates an audio thumbprint inside of all of its recordings, which, while inaudible to the naked ear, can be easily detected when looking at the waveform itself. Such thumb printing wouldn’t be enough to stop a quick hit of fake news before it went viral. But it could, at least, be used to disprove that someone famous said something controversial.
Still, it’s an imperfect security measure–and an ongoing focus for the company. “There are very sophisticated audio engineers out there. It may be possible for them to one day edit these watermarks,” says Pappas. “That’s why part of our work is finding new ways to make the watermark more deeply ingrained into the audio itself. So we’re doing new machine learning research to make watermarks robust.”
Misha Leybovich always dreamed of being an astronaut. He’d go to UC Berkeley for his undergraduate engineering degree, then head to MIT to get a pair of masters degrees in tech policy and aerospace engineering. Nobody ever told him that most adults never actually became astronauts, he jokes–instead, he landed at McKinsey consulting, only to get the itch to found his own startup.
His platform, Meo, has been three years in the making. Meo’s trick is that it can take 2D video of someone’s face and convert that into a 3D avatar. This avatar could be a dragon, or a kitten, but it can convey your emotional states–your unique micro-expressions, like a subtle smile or scowl–in a way that Apple’s animoji does not. Having raised $2.5 in funding, Meo has just gotten good enough that it’s being shopped around to the video game industry now to incorporate into gaming.
Leybovich imagines that Meo will eventually let you Deepfakes yourself for anything from a game to a social network to a beauty app. By capturing hundreds of thousands, even millions of photos of you, Meo can make it look like it’s you who is storming your buddy’s castle, or it can help you see how the new eyeliner would look on your face. “At the end of the day, once there’s a 3D model of you that you create, you can do anything with it,” says Leybovich.
Leybovich admits it’s possible you might abuse the system to create a 3D model from a source other than your own face. “Eventually you’ll be able to impersonate a celebrity, or whatever else,” he says. Maybe you could point your phone to a video of a Twitch streamer playing a game and create an avatar based upon them. Or maybe you don’t go for such a serious forgery. Maybe you just lie a bit, using the a theoretical Instagram filters to tweak your appearance–adding a few muscles, a more flattering complexion, or tweaking your age and gender. Leybovich admits that video just feels believable by nature, so there could be real possibility for abuse. An adult might be able to convincingly pose as a child. A jealous ex might pretend to be a new suitor.
“Any good technology [company], I suppose, hopes their work is used for good and not evil,” says Leybovich. “We’re trying to think ahead and build in safeguards to our tech.”
For Meo, the safeguard is potentially brilliant–and with any hope, might become a model for other services to emulate. Inside its SDK, which developers must use to incorporate Meo, the firm is including what they have dubbed a real score and a live score. The real score depicts how much you’ve changed your model from your core face shape. Shrink your nose a bit, and maybe you have a real score of 85%. Turn yourself into an 80-year-old elf, and maybe you have a real score of 25%. The live score depicts how certain Meo is that your avatar is coming from your actual face versus a prerecorded thing. It’s up to the developer to decide how to incorporate these scores into their apps, but it seems likely that they could be posted right on someone’s profile page–basically a click away for a quick fact check.
“It should just be transparent,” says Leybovich. “Can we make the industry adopt a real score and a live score? No, we’re just one company. We’re not regulators. But given that we are leaders right now, if we’re able to be successful, and we beat the drum about it, hopefully we can create a standard and maybe shame everyone else into also doing it.”
Deepfakes sparked a revolution in media manipulation when they created free software from publicly available research. The knowledge about how machine learning could impersonate identities was out there, and they just crafted it into a discernible app. But these new companies demonstrate that commercializing similar media-manipulating tech–or using thousands upon thousands of photos or audio files to train a machine to mimic a person or place–can have some practical benefits. Companies that want to use AI to manipulate images, video, and speech will be held to higher standards than random Redditors (in theory, at least). If nothing else, these companies know that they must earn a profit. To do that, they need to keep their research proprietary–and limit the most obvious avenues of exploitation.
“It’s partially wanting to be a good citizen. I’m sure it’s self-serving as well,” says Leybovich. “At the end of the day, I’m sure Facebook wishes it weren’t pulled in front of Parliament. It’s not good for business or society. It’s not just trying to do right, but ultimately, if people are abusing your stuff, your business is having a problem. It’s better to try to prevent it.”