Five years ago, Mark Zuckerberg stated in no uncertain terms that Facebook was going to build the metaverse, a digital world that’s a surrogate for our real one, predicted by science fiction for decades. That’s why he bought Oculus, and invested heavily in VR. But a question still remains: Will we be able to gather in a room as seemingly real people, rather than cartoony versions of ourselves? Will we connect with true, 3D presence to look one another in the faces, or even the eyes? And given that we’d actually be wearing bulky VR headsets, which aren’t so flattering for a photo opp, how could that even work?
Now, thanks to new research out of Facebook Reality Labs, we’re getting an early taste of just this possibility. And from the looks of it, the system is remarkably convincing at turning VR-wearing human beings into pretty normal-looking people. Their new VR system can depict not just a stream of you talking, but of you puffing your cheeks, sticking out your tongue, and even showing your teeth.
The method required that Facebook construct a more capable, scanning VR headset, fitted with nine cameras that film your face. Three are infrared cameras—invisible light sensors. The other six are more typical RGB video cameras.
The cameras track your movements and train an AI model to create what Facebook calls a “Codec Avatar.” It’s like a digital copy of you. The good news is, once that model is built, you can get by with a lower-end VR headset that has only three cameras instead of nine. It tracks more limited information, and uses the AI model to infer what you’re doing.
Even with three cameras, though, it’s not enough to reconstruct everything and beam it over to a friend in real time like your webcam. So the catch is that the headset doesn’t depict what it sees 1:1. It takes your face information and filters it through your AI-built avatar to create something that’s close to your face, but not a literal depiction that’s going to be perfect in every instance.
From Facebook’s example video, the results are still astounding. Despite wearing a giant VR headset, your entire face is depicted complete with the unique shape of your mouth. Yes, there’s something of a Botoxed look to it all, as if every single muscle isn’t moving as intended. But it’s a leap from anything in the realm we’ve seen so far.
More importantly, though, it’s yet another data point for an important trend line. We saw as deepfakes used AI to replace someone’s face with another. Then we saw media companies begin to monetize this sort of approach in public software. Next, Apple showed up, debuting a way to fake eye contact in Facetime with AI. And now, Facebook has demonstrated that cameras coupled with AI can literally depict your entire face . . . or something like it . . . for immersive telepresence.
Of course, while these depictions are convincing, the question becomes just how real they actually are. Is AI doing the equivalent of adding an Instagram filter for your expressions, or is it both inventing and losing key data that depict your personality and emotions? As this technology scales to more companies, it probably won’t be either/or, this person is fake or not fake. Reality will become a gradient, likely to vary wildly by platform.
To some extent we are shaped by platforms already. We alter what we say because of Twitter’s quippy character restrictions, or change what we film because of Snapchat’s self-destructive videos. But these limitations are all built transparently into the core UI of these apps, so they’re expected to affect the way we create and share content. The problem is that, with AI, the limitations are so convincing that we don’t even know they’re there. And it’s important that we don’t lose ourselves somewhere inside AI’s white lies.