In Russian novelist Victor Pelevin’s cyberpunk novel, Homo Zapiens, a poet named Babylen Tatarsky is recruited by an old college buddy to be an advertising copywriter in Moscow amid post-Soviet Russia’s economic collapse. With a talent for clever wordplay, Tatarsky quickly climbs the corporate ladder, where he discovers that politicians like then-Russian president Boris Yeltsin and major political events are, in fact, virtual simulations. With the advent of ever-more sophisticated deepfakes, it feels as if something like Pelevin’s vision is slowly coming true.
Within the field of deepfakes, or “synthetic media” as researchers call it, much of the attention has been focused on fake faces potentially wreaking havoc on political reality, as well as other deep learning algorithms that can, for instance, mimic a person’s writing style and voice. But yet another branch of synthetic media technology is fast evolving: full body deepfakes.
In August 2018, University of California Berkeley researchers released a paper and video titled “Everybody Dance Now,” demonstrating how deep learning algorithms can transfer a professional dancers’ moves onto the bodies of amateurs. While primitive, it showed that machine learning researchers are tackling the more difficult task of creating full body deepfakes. Also in 2018, a team of researchers led by Dr. Björn Ommer of Heidelberg University in Germany published a paper on teaching machines to realistically render human movements. And in April of this year, the Japanese artificial intelligence company Data Grid developed an AI that can automatically generate whole body models of nonexistent persons, identifying practical applications in the fashion and apparel industries.
While it’s clear that full body deepfakes have interesting commercial applications, like deepfake dancing apps or in fields like athletics and biomedical research, malicious use cases are an increasing concern amid today’s polarized political climate riven by disinformation and fake news. For now, full body deepfakes aren’t capable of completely fooling the eye, but like any deep learning technology, advances will be made. It’s only a question of how soon full body deepfakes will become indistinguishable from the real.
Synthesizing entire human bodies
To create deepfakes, computer scientists use Generative Adversarial Networks, or GANs. Comprised of two neural networks—a synthesizer or generative network, and a detector or discriminative network—these neural networks work in a feedback loop of refinement to create realistic synthetic images and video. The synthesizer creates an image from a database, while the latter, working from another database, determines whether the synthesizer’s image is accurate and believable.
The first malicious use of deepfakes appeared on a Reddit, where faces of actresses like Scarlett Johansson were mapped onto porn actors. Rachel Thomas of Fast.AI says that 95% of the deepfakes in existence are pornographic material meant to harass certain individuals with fake sexual acts. “Some of these deepfakes videos aren’t necessarily using very sophisticated techniques,” says Thomas. But, that is starting to change.
Farid points to the Chinese deepfake app Zao as being illustrative of how quickly the technology has evolved in less than than two years.
“The ones that I saw [from Zao] looked really, really good, and got around a lot of the artifacts, like in the movie versions where the face flickered,” says Farid. “It’s improving. Getting this as an app working at scale, downloading to millions of people, is hard. It’s a sign of the maturity of the deepfake technology.”
In case you haven't heard, #ZAO is a Chinese app which completely blew up since Friday. Best application of 'Deepfake'-style AI facial replacement I've ever seen.
Here's an example of me as DiCaprio (generated in under 8 secs from that one photo in the thumbnail) ???? pic.twitter.com/1RpnJJ3wgT
— Allan Xia (@AllanXia) September 1, 2019
“With deepfake images and videos, we’ve essentially democratized CGI technology,” he says. “We’ve taken it out of the hands of Hollywood studios and put it in the hands of YouTube video creators.”
Björn Ommer, professor for computer vision at the Heidelberg University Collaboratory for Image Processing (HCI) & Interdisciplinary Center for Scientific Computing (IWR), leads a team that is researching and developing full body synthetic media. Like most researchers in the field, the group’s overall goal is to understand images and to teach machines how to understand images and video. Ultimately, he hopes the team gains a better understanding of how human beings understand images.
“We’ve seen synthetic avatars that have been created not just in the gaming industry but a lot of other fields that are creating revenue,” says Ommer. “For my group, in particular, it’s entirely different fields that we are considering, like biomedical research. We want to get a more detailed understanding of human or even animal posture over time, relating to disabilities and the like.”
There are critical differences between the processes of synthesizing faces and entire bodies. Ommer says that more research into face synthesis has been carried out. And there are a few reasons for this. First, any digital camera or smartphone has built-in face detection, technology that can be used for tasks like smile detection or to identify the person a viewer is looking at. Such applications can generate revenue, leading to more research. But they have also led to, as Ommer says, “a lot of data set assembly, data curation, and obtaining face images—the substrate upon which deep learning research is built.”
Secondly, and more interesting to Ommer, is that while each human face looks different, there isn’t much variability when the face is compared to an entire human body. “That is why the research on faces has come to a stage where I would say it is creating really decent results compared to entire human bodies with much more variability being there, much more complicated to handle, and much more to learn if you head in that direction,” says Ommer.
Ommer isn’t sure when full synthesized bodies will be of the quality that he and researchers want. Looking at the maturation of malicious deepfakes, however, Ommer notes that humans can already be tricked quite easily without fakes created by deep learning computer vision intelligence, artificial intelligence, or other technologies. The slowed-down video of Nancy Pelosi, which made the House Speaker appear drunk, suggests to him that deepfakes with this type of very simple twist are just around the corner and could be taken at face value by certain segments of society.
“But, if you want to make it appealing to larger society, it will take a few more years,” says Ommer, who says full body and other deepfakes will become cheaper and more prevalent. “The research community itself has moved in a direction—and this is very much appreciated by much of the community that is responsible for a lot of this steady progress that we see—where the algorithms are easily available, like on Github and so on. So, you can just download the most recent code from some paper, and then, without much knowledge of what’s under the hood, just apply it.”
Feeling “powerless and paralyzed”
Not every person will be able to create a “blockbuster deepfake.” But, given more time, Ommer says money will no longer be an issue in terms of computational resources, and the applicability of software will also become much easier. Farid says that with full body deepfakes, malicious creators will be able to work deepfake technology’s typically stationary figure talking directly into the camera, making targets do and say things they never would.
Tom Van de Weghe, an investigative journalist and foreign correspondent for VRT (the Flemish Broadcasting Corporation), worries that journalists, but also human rights activists and dissidents, could have footage of them weaponized by full body deepfakes. As an international correspondent for the China Bureau of Belgian television from 2007 to 2012, Van de Weghe and his team ran afoul of the Chinese government while reporting on an HIV scandal in Henan Province. After getting beaten up while making their way to an airport, Van de Weghe says authorities told media Xinhua News Agency, China’s state press agency, that it was a fake story—that he and his team were pushed by some angry farmers.
“That was my first experience with fake news—I felt powerless and paralyzed in a way,” says Van de Weghe. “Once you’re official story is distorted by the largest press agency in the world, Xinhua News Agency, and it’s been distributed all over the world in their outlets, and you see it printed on the front page of China Daily, you feel like, what the hell?”
This harrowing experience, the explosion of fake news during the 2016 election, and the rise of deepfakes in 2017 inspired Van de Weghe to research synthetic media. In the summer of 2018, he began a research fellowship at Stanford University to study ways of battling the malicious use of deepfakes.
“It’s not the big shots, the big politicians, and the big famous guys who are the most threatened,” says Van de Weghe. “It’s the normal people—people like you, me, female journalists, and sort of marginalized groups that could become or are already becoming the victims of deepfakes.”
Two weeks ago, Dutch news anchor Dionne Stax discovered her face “deepfaked” onto a porn actress’s body, after the video was uploaded to PornHub and distributed on the internet. Although PornHub quickly removed the video, Van de Weghe says that the damage to her reputation had already been done. He also points to China’s AI public broadcasters as proof that the Chinese government has the capability to pull off realistic deepfakes.
“They look very convincing, if you ask me,” says Van de Weghe. “By 2020, there should be 200 million cameras on China’s streets to track people for their social credit program. They have a lot of data, and they can use all of that to manipulate certain things.”
Van de Weghe thinks full body deepfakes could completely alter the impact of events like the protests in Hong Kong in troubling new ways—such content could make it seem as if protesters are acting violently or portray law enforcement’s crackdown in a positive light.
“As a journalist in China, every moment is being filmed,” says Van de Weghe. “To blackmail a journalist or put them in a negative light, they could alter that footage into a deepfake. They can do anything with that footage and manipulate it quite easily. I’m not saying they are doing it, but they’re probably able to do that already. The full body deepfakes are getting better, so the question is, what will the first complete deepfake look like?”
To imagine how a full body deepfake might work, Van de Weghe points to 2018 footage of Jim Acosta, CNN’s chief White House correspondent. In a video clip uploaded by Paul Joseph Watson, an editor at conspiracy theory site Infowars, Acosta seems to aggressively push a white house staffer trying to take his microphone. The original clip, broadcast by C-SPAN, differs markedly from Watson’s. The Infowars editor claimed he didn’t doctor the footage and attributed any differences to “video compression” artifacts. But, as The Independent demonstrated in a side-by-side analysis of the videos in an editing timeline, Watson’s video is missing several frames from the original. A full body deepfake could, like editing video frames, alter the reality of an event.
"He never once touched her."
That is a complete lie. He clearly did.
Is whatever you're paid by CNN really worth making a total fool out of yourself for the world to see? pic.twitter.com/vgDynDQWJf
— Paul Joseph Watson (@PrisonPlanet) November 8, 2018
“[The Acosta video] was something real,” says Van de Weghe. “If the White House is capable of attributing a story like that, just imagine what a less democratic regime like China is capable of with full body deepfakes.”
Deeptrace Labs, founded in 2018, is a cybersecurity company that is building tools based on computer vision and deep learning to analyze and understand videos, particularly those that could be manipulated or synthesized by any sort of AI. Company founder Giorgio Patrini, previously a postdoc researcher on deep learning at the DELTA Lab, University of Amsterdam, says that a few years ago he started investigating how technology could prevent or defend against future misuse of synthetic media.
Patrini believes that malicious deepfakes, made up of a combination of synthetic full bodies, faces, and audio, will soon be used to target journalists and politicians. He pointed to a deepfake porn video that featured Indian journalist Rana Ayyub’s face swapped onto a porn actress’s body, part of a disinformation campaign intended to descredit her investigative reporting after she publicly pushed for justice in the rape and murder of an 8-year old Kashmiri girl. And in March, Deeptrace Labs looked into a purported deepfake of Gabon President Ali Bongo, who had recently suffered a stroke. While many in the African country thought Bongo’s immobile face, eyes, and body suggested a deepfake—including the Gabon military, which launched an unsuccessful coup based on this belief—Patrini told Mother Jones that he did not believe the video of the president had been synthesized.
“We couldn’t find any reasons to believe it was a deepfake, and I think that was later confirmed that the president is still alive but that he’d had a stroke,” says Patrini. “The main point I want to make here is that it doesn’t matter if a video is a deepfake or not yet—it’s that people know that it can spark doubt in public opinion and potentially violence in some places.”
Recently, Van de Weghe learned that a political party operative approached one of the most popular deepfake creators, requesting a deepfake to damage a certain individual. Such custom, made-to-order deepfakes could become big business.
“There is money to be earned with deepfakes,” says Van de Weghe. “People will order it. So, a government doesn’t have to create a deepfake—they just have to contact a person who is specialized in deepfakes to create one.”
The Wall Street Journal recently reported that a UK energy company CEO was fooled into transferring $243,000 to the account of a Hungarian supplier. The executive said he believed he was talking to his boss, who had seemingly approved the transaction. Now, the CEO believes he was the victim of an audio deepfake scam known as vishing. Farid believes other fraudulent deepfake financial schemes, which might include full body deepfakes, are only a matter of time.
“I could create a deepfake video of Jeff Bezos where he says that Amazon stock is going down,” says Farid. “Think of all of the money that could be made shorting Amazon stock. By the time you rein it in, the damage has already been done. . . . Now imagine a video of a Democratic party nominee saying illegal or insensitive things. You don’t think you can swing the vote of hundreds of thousands of voters the night before an election?”
Farid thinks a combination of social media and deepfake videos, whether of faces or full bodies, could easily wreak havoc. Social media companies are largely unable or unwilling to moderate their platforms and content, so deepfakes can spread like wildfire.
“When you pair the ability to create deepfake content with the ability to distribute and consume it globally, it’s problematic,” he says. “We live in a highly polarized society, for a number of reasons, and people are going to think the worst of the people they disagree with.”
But for Fast.AI’s Thomas, deepfakes are almost unnecessary in the new cyber skirmishes to negatively influence the political process, as governments and industry already struggle with fake information in the written form. She says the risks aren’t just about technology but human factors. Society is polarized, and vast swaths of the United States (and other countries) no longer have shared sources of truth that they can trust.
This mistrust can play into the hands of politically motivated deepfake creators. When a deepfake is debunked, as privacy scholar Danielle Citron noted, it can suggest to those who bought the lie that there is some truth to it. Citron calls this “the liar’s dividend.” Farid thinks advancements in full body deepfake technology will make the overall problem of this type of nefarious deepfakery worse. The technology is evolving fast, spurred by university research like “Everybody Dance Now” and private sector initiatives such as Zao to monetize deepfakes.
“Once you can do full body, it’s not just talking heads anymore: you can simulate people having sex or killing someone,” Farid says. “Is it just around the corner? Probably not. But eventually it’s not unreasonable that in a year or two that people will be able to do full body deepfakes, and it will be incredibly powerful.”
Currently, no consensus approach to rooting out deepfakes exists within the tech industry. A number of different techniques are being researched and tested.
Van de Weghe’s research team, for instance, created a variety of internal challenges that explored different approaches. One team investigated digital watermarking of footage to identify deepfakes. Another team used blockchain technology to establish trust, which is one of its strengths. And yet another team identified deepfakes by using the very same deep learning techniques that created them in the first place.
“Some Stanford dropouts created Sherlock AI, an automatic deepfake detection tool,” says Van de Weghe. “So, they sampled some convolutional models and then they look for anomalies in a video. It’s a procedure being used by other deepfake detectors, like Deeptrace Labs. They use the data sets called FaceForensics++, and then they test it. They’ve got like 97% accuracy and work well with faces.”
Deeptrace Labs’ API-based monitoring system can see the creation, upload, and sharing of deepfake videos. Since being founded in 2018, the company has found over 14,000 fake videos on the internet. Insights gleaned by Deeptrace Labs’ system can inform the company and its clients about what deepfake creators are making, where the fakes came from, what algorithms they are using, and how accessible these tools are. Patrini says his team found that 95% of deepfakes are face swaps in the fake porn category, with most of them being a narrow subset of celebrities. So far, Deeptrace Labs hasn’t seen any full body synthesis technology being used out in the wild.
“You cannot really summarize a solution for these problems in a single algorithm or idea,” says Patrini. “It’s about building several tools that can tell you different things about synthetic media overall.”
Van de Weghe thinks the next big thing in anti-deepfake technology will be soft biometric signatures. Every person has their own unique facial tics—raised brows, lip movements, hand movements—that function as personal signatures of sorts. Shruti Agarwal, a researcher at UC-Berkeley, used soft biometric models to determine if such facial tics have been artificially created for videos. (Agarwal’s thesis adviser is fake video expert and Dartmouth professor Hany Farid.)
“The basic idea is we can build these soft biometric models of various world leaders, such as 2020 presidential candidates, and then as the videos start to break, for example, we can analyze them and try to determine if we think they are real or not,” Agarwal told Berkeley News in June of this year.
Although Agarwal’s models aren’t fullproof, since people in different circumstances might use different facial tics, Van de Weghe think companies could offer soft biometric signatures for identity verification purposes in the future. Such a signature could be something as well-known as eye scans or a full body scan.
“I think that’s the way forward: create bigger data sets in cooperation with academics and big tech companies,” Van de Weghe says. “And we as newsrooms should try and train people and build media literacy about deepfakes.”
Recently, Facebook and Microsoft teamed up with universities to launch the Deepfake Detection Challenge. Another notable effort is the Defense Advanced Research Projects Agency’s (DARPA) goal of tackling deepfakes with semantic forensics, which looks for algorithmic errors that create, for instance, mismatched earrings worn by a person in a deepfake video. And in September 2018, the AI Foundation raised $10 million to create a tool that identifies deepfakes and other malicious content through both machine learning and human moderators.
But, Fast.AI’s Thomas remains skeptical that technology can fully solve the problem of deepfakes, whatever form they might take. She sees value in creating better systems for identifying deepfakes but reiterates that other types of misinformation are already rampant. Thomas says stakeholders should explore the social and psychological factors that play into deepfakes and other misinformation as well, like how the slowed-down Nancy Pelosi video played into the confirmation bias of voters who dislike her.
Why it’s tough to regulate deepfakes
Thomas, Van de Weghe, and Farid all agree that governments will have to step in and regulate deepfake technology because social media platforms, which amplify such incendiary content, are either unable or unwilling to police their own content.
In June, Rep. Adam Schiff (D-CA), chair of the House Intelligence Committee, held the first hearing on the misinformation and disinformation threats posed by deepfakes. In his opening remarks, Schiff made note of how tech companies responded differently to the fake Pelosi video. YouTube immediately deleted the slowed-down video, while Facebook labeled it false and throttled back the speed at which it spread across the platform. These disparate reactions led Schiff to demand social media companies establish policies to remedy the upload and spread of deepfakes.
“In the short-term, promoting disinformation and other toxic, incendiary content is profitable for the major platforms, so we have a total misalignment of incentives,” says Fast.AI’s Thomas. “I don’t think that the platforms should be held liable for content that they host, but I do think they should be held liable for content they actively promote (e.g. YouTube recommended Alex Jones’ videos 16 billion times to people who weren’t even looking for him).”
“And, in general, I think it can be helpful to consider how we’ve [legislatively] dealt with other industries that externalize large costs to society while privately claiming the profits (such as industrial pollution, big tobacco, and fast food/junk food),” Thomas adds.
Deeptrace Labs’ Patrini says regulation of synthetic media could prove complicated. But, he believes some current laws, like those covering defamation, libel, and copyright, could be used to police malicious deepfakes. A blanket law to stop deepfakes would be misguided, says Patrini. Instead, he advocates government support for synthetic media applications that benefit society, while funding research into creating tools to detect deepfakes and encouraging startups and other companies to do the same.
“[Government] can also educate citizens that this technology is already here and that we need to retrain our ears and eyes to not believe everything we see and hear on the internet,” says Patrini. “We need to inoculate people and society instead of repairing things in maybe two years when something very catastrophic or controversial might happen because of misuse of this technology.”
Ommer says computer vision researchers are well aware of the malicious applications of deepfakes. And he sees a role for government to play in creating accountability for how deepfakes are used.
“We all see applications of image understanding and the benefits that it can potentially have,” says Ommer. “A very important part of this is responsibility and who will take a share in this responsibility? Government agencies and so on who have interviewed me obviously see their share in this responsibility. Companies say and probably—in the interest of their stockholders—have to say that they see their responsibility; but, we all know how they have handled this responsibility up until now.”
“It’s a tricky thing,” Ommer says. “Just hoping that this will all go away . . . it won’t.”