Today, most of us see the internet on some variation of an opaque rectangular screen. Think of them as little digital islands—black mirrors—dwarfed by the real world around them. Then imagine a screen that covers your whole field of vision and is see-through, like a windshield, so that the digital stuff can be convincingly mixed in with the real.
That’s more or less what mixed reality is like. MR is a spatial computing experience in which you can use hand gestures or your voice or your eyes to interact with the virtual content. The early mixed reality experiences of today require an expensive headset device that contains multiple motion sensors, depth sensors, and eye tracking cameras, like the Magic Leap One and the Microsoft HoloLens.
“Augmented reality” may be a more familiar term. AR is a slightly less intense cousin of MR in which digital imagery is layered over the real-world view, but you can’t interact with it in real time as you can in MR. Today AR runs primarily on smartphones, using the rear-facing camera, along with some advanced software, to capture and measure your view of the real world. Apple and Google each have development platforms–ARKit and ARCore, respectively–for the creation of these experiences for iOS and Android devices.
Actually, AR and MR experiences, which enhance our view of the real world, represent just one end of the spatial computing continuum. At the other end is virtual reality (VR), in which the real world is shut out entirely. A VR headset with a “world view” camera on the outside might hit just short of the far end of the spectrum, for example. The various experiences along the continuum are sometimes collectively called “Extended Reality” or XR.
While VR headsets and content are already common (if not exactly mainstream), AR and MR remain mostly in the hype stage, with sexy MR demo videos and prototype after prototype of AR glasses. The truth is, some serious technical, creative, and sociocultural challenges stand between the grand vision of XR and real products appealing enough to, quite literally, put in front of our faces. But big tech companies are investing heavily in the development of various XR technologies, and plenty of smart people believe that some form of it will eventually replace the smartphone screen as our primary personal computing space.
What we really want to know is what we always want to know: How soon? When will the “Magicverse,” as Magic Leap calls it, be as accessible as picking up a svelte pair of XR glasses from my night table and putting them on?
A better question is how quickly the technology is moving toward a more refined experience, and what challenges remain. To find out, I asked experts from various companies that will play big roles in the developing XR ecosystem. (Some responses have been lightly edited for clarity.)
The Magic Leap One mixed reality headset. [Photo: courtesy of Magic Leap]
Mixed Reality’s grand vision
Greg Rinaldi, Director of Developer & Creator Relations at Magic Leap
The Magic Leap One is considered to be one of the two best-performing XR headsets available today (Microsoft’s HoloLens being the other). It’s now available to consumers via the Magic Leap website for $2,300, and AT&T will soon be selling it in its stores.
On how digital layers will hover over our reality:
“We live in a digital world today. There are digital layers now. We can’t see them. We can interact with them through our phones. That can be a social network. There’s this concept of the physical world and then there’s the digital world, and really where we see this technology going, it’s not about the headset. The headset is [just] a mechanism.
“It’s about all those digital layers working together and providing you information. And that information could be a bus route. Or if you’re an engineer it could be about understanding what the sewage system looks like underneath the city. These functional layers are almost like radio stations in a way. You can tune into the transportation layer, which might say here’s where all the buses are, here’s where all the Ubers are, here’s where all the trains are, here’s where all the flying Ubers are!
“But then you might tune into the entertainment one. There will be dozens, hundreds, thousands of the entertainment ones alone. You know we love to be entertained as humans. So you could tune into the Game of Thrones channel and you’re walking down the Embarcadero and there’s a dragon flying and you’re seeing it and I’m seeing it and we’re interacting with it and we’re having this shared experience. You think of Pokémon Go, but much more realistic and much more engaging because now you’re able to see in three dimensions that character happening. That’s what we mean when we talk about ‘the Magicverse.’ It’s all about serving that digital content.”
Mixed Reality goes to work
Greg Sullivan, Communications Director, Microsoft HoloLens
Microsoft first introduced its HoloLens 1 headset in 2015 and shipped a developer edition in 2016. After seeing some adoption in the energy, construction, manufacturing, healthcare, and retail industries, the company unveiled the HoloLens 2 in February. The second HoloLens is also aimed at workplace use, and features a greatly expanded field of view, full eye tracking, expanded gesture controls, and a more comfortable design. It will start shipping later this year at $3,500.
On what comes after HoloLens 2:
“[Customers] did tell us a number of things we could improve on with the HoloLens 2, and the ones we chose to concentrate on were immersion, comfort, and usefulness out of the box.
“Looking forward we will continue to invest in those same three categories. What you’ll see in the next version is even more immersiveness, more comfort, and more applications that have more value. It’s relatively easy to do any one of those things but it’s hard to do all three.
“You can make it more immersive by including more powerful displays, but that’s going to make the device bigger, heavier, hotter, and have a shorter battery life. You can make it more comfortable by making it weigh less, but then you give up computing power. So it’s relatively easy to do any one of those things–the challenge is achieving advancements in all three areas.
“And what we’re hearing from customers is that those are their three key desires, so you can kind of make some assumptions about the improvements we’ll make to HoloLens in the future. We’ll keep investing in those three key areas. We have a team of human factors engineers who will develop new ways of making it more comfortable for the user. For the second HoloLens we invented a whole new display technology that incorporates microelectrical mechanical transfer. We may have to develop new types of displays in the future. We will do more with the natural interaction with the device. ”
“In general we want to take the digital world away from the flat screen so that you don’t have to put down your tools and you can learn while doing. Mixed reality guides will benefit new categories of workers, people who are using tools or walking around taking care of patients. And by the way those kinds of workers represent the majority of the workers today.”
For Facebook, AR/VR’s killer app is togetherness
Facebook is thought to have some of the best computer vision and spatial computing talent in the industry working on social experiences in VR (like Spaces), and on AR experiences for smartphone apps like Instagram, Messenger, and Facebook. Facebook’s first consumer product, the Portal home speaker/camera, offers some AR experiences–expect the company to add more in the future. The company owns Oculus, which develops VR headsets, but it’s likely exploring options for releasing more of its own AR hardware products in the future.
Rachel Franklin, Head of AR/VR Experiences at Facebook, on why the company is focusing on avatar-based social experiences in VR now:
“When I talk about some of the things we’re exploring [in VR], that would be about synchronous (or real-time) social, versus asynchronous social, which I think is happening a lot right now. There is something really interesting about utilizing all of the tricks at our fingertips, for lack of a better word, because you can be immersed in VR and figure out how you express yourself, how you communicate with other people in a fully formed environment versus an augmented one, and figure out how to pull out really relevant, important pieces that then get brought into your real world.
“The headsets exist right now out in the mass market for VR, so we’ve definitely been playing in that space. But I think both [VR and AR] are super important; I don’t think one trumps the other.
Ficus Kirkpatrick, Director of Engineering, AR/VR at Facebook, on how Facebook is using AR for social experiences today:
“I want to differentiate between AR on the headset, which essentially no one has yet–it’s not a mass-market consumer technology. [But] there are a great many people doing AR in a social way on smartphones right now. We see people that want to share with each other, they want to self-express in different ways, they want to do storytelling types of things, and AR has been very, very useful for that in products that are on the market today. Like face filters, which is probably the most common use of AR in the market right now, to many [other] use cases you see in consumer technology, like storytelling on Portal. I think it’s going to end up being this multimodal mixed world where some people are going to have headsets, some people are going to have Portals, some people are just going to have their phones, and we need to maximize that feeling of togetherness for any of those participants.
Rachel Franklin, on using avatars to avoid the uncanny valley, that creepy no-man’s land where a human representation looks almost, but not exactly, real:
It’s interesting because there are technical challenges but there are also social acceptance challenges. Going back to VR, in the headset you physically have something on your face. Even if we tried to replicate you completely without an avatar we would still show you with something on your face. In terms of the avatars we’re working with today, you can see Spaces is pretty rudimentary, going all the way to [Facebook Reality Labs’s] Yaser Sheikh’s experiments with the (super life-like, but years from market-ready) Codec Avatars, which are trying to sort of leap the uncanny valley.
“And definitely from a technical standpoint, getting faces right, getting movements right, getting emotions right is great–and typically in these experiences where you do have a representation of yourself you steer far away from the uncanny valley for that reason. A lot of the avatar work that isn’t in research is really trying to give you an expression that’s using all of the ability we have, all the digital magic to let you express yourself, but is less on the truly realistic side.”
The smartphone will play a big role in XR’s near future
Aparna Chennapragada, VP of Product, AR & VR, Google
Google has come a long way since releasing its Google Glass smartglasses. Google uses computer vision technology across a number of AR products, including the Google Lens app and its ARCore phone-based augmented reality platform. Google is now testing an in-world AR directions feature in its Maps app.
On how Google thinks about AR now:
“For us the big driver here is how can AR and the camera be another way of assistance and utility. What AR allows us to do is put the answers right where the questions are. I’ll give you the classic walking navigation example where you walk up out of the subway with the question in your mind of which way do I turn. We should put the answers right where the questions are, which means right on the street in front of you.”
On how computer vision will advance in future AR:
“Humans are visual animals–30% of the neurons in the cortex are devoted to visual understanding. For instance if I’m looking at a grocery store shelf of cereals I will look at the whole shelf, look at a bunch of cereals, and say I’m looking for the ones that are gluten-free. As computer vision gets better we should move from a single thing to a whole scene and be able to understand the whole scene, whether that be an entire document or an entire store shelf. The second dimension is not just giving you information but being able to overlay information. For instance in the store shelf case wouldn’t it be great if you could just block out [the rest] and show only the gluten-free cereals?
“And the final dimension is about not just giving you bits of information but getting things done, actually taking action on things. Sometimes these things can be simple, like translation: You see a foreign language in the scene and it automatically translates it for you. To be clear there are some really hard technical problems underlying these things, which is why I tell my engineers that’s job security.
On good reasons for phone-based AR:
“People have talked about [AR] for a while. Why get excited about this now? For me there are three reasons that are all coming together right now. One is this big jump in deep learning and machine learning techniques that is helping computer vision be that much better at recognizing things. The second one is that we understand a lot more of the world with things like Google Knowledge Graph, which [has] more than 50 billion facts about all sorts of things.
“The third thing is . . . there are 3 billion users with phones, and the camera is the most used tool–it’s the number one thing people are using on their phones to take pictures of all sorts of things. So putting these things together we think we don’t have to wait for eventual other form factors for AR to be powerful. The phone is actually a very powerful computing platform.
Dr. Andre Wong, Vice President, 3D Sensing, Lumentum
Lumentum makes the VCSEL lasers used in the iPhone’s True Depth camera system. It’s likely that at least one of this year’s iPhones will contain are rear-facing (“world-facing”) depth camera that will make ARKit apps much more convincing. Depth cameras will increasingly show up on the backs of Android phones, too.
On phone-based augmented reality:
“From my view AR will probably manifest itself in some more simplistic, doable use case with your phone. Just because people are going to be with their phones all the time. I’ve always personally felt that if people had to wear something all the time, until there’s a thousand choices, it won’t catch on–it’s hard to get people to wear stuff. That was the problem with Google Glass is that it just looked so ridiculous.
Front-facing 3D sensing has seen its primary application in biometrics. For the front-facing camera the requirements are for a one-meter range. What we’ve been working on with our customers is to take that one meter and increase it to three of four meters [for rear-facing depth cameras].
Apple’s released ARKit, and Google has released ARCore, and if you’ve tried some of those AR apps, like Houzz or Ikea, they work reasonable well. And they’ve created this platform that’s ready for AR. What’s missing is really good depth map room scanning. Once these world-facing cameras come out, in conjunction with the dual cameras [already on the backs of premium smartphones], then AR gaming and a lot of those AR apps become much more useful.”
Cameras as the next big computing platform
Allison Wood, cofounder and CEO, Camera IQ
Camera IQ helps brands create and distribute augmented reality experiences on social media.
On the advent of computer vision:
“First–looking at how ubiquitous cameras are as a piece of hardware–cameras have shown up in everything: our computers, our phones, our cars, in the devices we put in our homes, in ATMs and stop lights, et cetera. A lot of cameras are attached to computers–in some cases really powerful computers like our phones and cars. So in a lot of ways we are allowing computers to see into our world and to understand it.
“The way we see that happening at scale today is on social media. We see that happening on Facebook and Snapchat and Instagram and TikTok where they’re using computer vision to map dog ears to your face or try on glasses. But the promise of the technology is so much more than that. Through computer vision we can start to understand the world around us and essentially allow the internet and the real world to live in the same time and space.
On spatial computing and the primacy of the camera:
“The way that happens is essentially through a camera, a computer, and a screen. And so you can think about the camera as a clearinghouse for other data sets including visual metadata, geolocation data, and really any other data set you could imagine. Combining that together can then surface contextually relevant content. Cameras are essentially becoming the primary interface in a multimodal distributed computing paradigm. Some people talk about it as spatial computing. When I say multimodal, I mean that input could be visual, an input could be your voice, an input could be gesture, an input could be touch. And those datasets could come from other computers like the ones we wear on our wrists.”
“We are seeing every tech platform, whether it’s Facebook coming out with Portal–which is basically a big-ass camera and screen which already has AR capabilities–or Snapchat’s Spectacles, which aren’t just a cool pair of sunglasses that you can take video on, attempting to own the next-generation hardware. Because if you own the camera and you own the hardware, you own the compute platform.”
5G will get you into a Black Mirror episode (“in a good way”)
Cristiano Amon, President, Qualcomm
Qualcomm wants its mobile processors to power the next generation of AR-enabled smartphones. The chip maker expects that some of those experiences will happen in XR glasses that tether via USB-C to the smartphone. It recently announced that it’ll provide reference designs (powered by the Snapdragon 855 chip) to help phone makers develop that kind of device themselves.
On the short-term effects of fast 5G networks on mixed reality devices and services:
“What we can do right away is, as you have the capability to have high speeds, and we’re talking about multiple gigabit speeds at very low latency. You have unlimited data capacity, unlimited speeds, unlimited storage connected with the cloud. So how you think about content for mixed reality and especially for VR, content will be readily available in-stream. Whether it’s gaming or video, 5G will immediately allow you access to content and you’ll be able to have that stream to your device or download it in no time.
“From that very humble beginning of what 5G can do, the next big thing is what we’ve been doing is you can start leveraging computation capabilities in the edge cloud. And with that you can think about 5G as beyond just broadband connectivity, but as a computer link between your mobile device and the cloud. You have unlimited computing or computing on demand. Today for some games and some VR games you need a state-of-the-art gaming PC. You won’t need that; you’ll be able to do that on your phone.”
On the long-term effects of fast 5G on mixed reality devices and services:
“I will go all the way to the end of the spectrum of what you can get with 5G. I used to joke that you can get into a Black Mirror episode, in a good way. If you think about the capabilities of augmented reality devices combined with the capabilities of 5G and social networking… Let’s say you have a 5G service deployed, you have high speed at very low latency, and all of sudden all of us are now wearing a companion device, or—depending on how the phone evolves—what may look like sunglasses. That device will have mixed reality capabilities with cameras.
“You walk into a room and immediately are using the camera with facial-recognition AI—and connected with the cloud 100% of the time, you can recognize who they are and go back to their LinkedIn account, their Instagram account, their Facebook account, which will provide information about who they are connected to, and when you have interacted with him, etc. etc. Those things are going to be possible.
“We are already seeing some devices starting to look like eyeglasses. I think 5G will really unlock the potential of augmented reality because it’s going to be a game of scale. I think the more devices you get, it will accelerate, and then you’re going to have a lot more applications. In fact I am comfortable making the statement that 5G is the missing ingredient in mixed reality that will allow AR devices to scale. It will not only solve the content problem but it will solve the form factor problem by offloading much of the compute power to the cloud.”
Forget “magic.” Focus on style and utility
Steve Sinclair, SVP Product & Marketing at Mojo Vision
Mojo is an early-stage AR/VR company based in Saratoga, California. The company has not yet talked about the product it’s building, but it’s funded by some big names, including Khosla Ventures, Fusion Fund, and Dolby Family Ventures. Sinclair is the former iPhone product manager at Apple.
On XR hype vs. reality:
“The AR industry is starting to turn a corner. We’ve been caught up in a hype cycle created by some of the early AR entrants and we’re just beginning to see solutions that solve real consumer and commercial problems.”
On requirements for mainstream adoption:
“The AR hardware platforms that are going to see traction with mass market consumers will not only offer utility, but will also emphasize true mobility and social acceptability. If you can’t put on an AR solution in the morning and wear it all day long, it won’t be successful. The tech has to look and feel normal–essentially it needs to be invisible. And I’m not just talking about the hardware on your face–if it requires big obvious gestures to control it or makes you touch your face or talk out loud to yourself in public, it’s not going to see mass adoption.
“To capture consumers’ imaginations, AR companies are going to have to offer engaging content that isn’t just the same information I can see on my smartphone or smartwatch today… a huge part of that is making sure that information is contextual. It has to be the right information at the right time. The flip side of that is knowing when not to interrupt someone. Augmenting the world at the wrong time will exacerbate the problems of device distraction that we see with smartphones today.”
Kieran Hall, Strategic Partnership Development at Rokid
Rokid, based in China, makes mixed reality glasses intended for workers, but intends to offer them to consumers as the market matures. The company says its new Rokid Glass AR glasses will become generally available later in 2019.
On XR glasses people will actually wear:
“We spend a hell of a lot of time thinking about how the person feels about wearing the device. This device is an assistive technology in processes where there’s a human involved in the chain to accomplish a task. We try to build a device that’s at least as comfortable to wear socially as a pair of sunglasses. They [other AR companies] make military-looking things that make the user look like a robot. We like that additional element of style.
“On a larger scale we’re talking about mass adoption of the technology, especially to consumers, who want it to fit into their lifestyle. Initially, they might focus on a single thing, like this can improve the experience for a minor inconvenience. Then they realize that there’s a whole host of things the technology can improve upon.”
On Magic Leap overshooting the needs of early adopters (somebody had to say it!):
“Magic Leap has a great piece of technology and a very bad product. They’ve played with the expectations of consumers. They released these impressive videos and then released a device that underwhelmed. It wasn’t a considerable step up from the HoloLens. The problem is that they’ve colored the consumer viewpoint on the technology. When we try to introduce a new product consumers will come at it with these lessened expectations.”
On the need for compelling content:
“It will be content studios that create the content that will delight consumers. The defining experience might be a game–a game that plays to the medium and really allows players to engage. The content studios need to make a decision on what people want. Someone is going to have to strike the right balance, someone in the right situation with the right device that people can afford. And they have to have the right scale.”
Stephen Lake, CEO and cofounder of North
North makes Focals, a $600 pair of smartglasses that display information like messages, calendar reminders, weather, and maps and directions, and include Amazon’s Alexa voice assistant.
On style, fit, and comfort:
“Other companies are focused on specific use cases like consumer experiences in the living room, or work experiences in the enterprise, but our real vision and focus is to take elements of our digital world that we get from our phones today and give them to you in everyday smartglasses. So you get the benefits of personal computing without the distractions of mobile devices. For us this is a different trade-off in our product versus other products. We’re not so concerned about different sensors as we are about style and fit and comfort and having the perfect size for you. Especially for people who already have to wear glasses.
“We’re not trying to create experiences where you see unicorns jumping over rainbows, and we’re not about spectacular product demos. What we’re interested in is that all-day everyday product that’s designed to fit into your life today.”
On mixed reality experiences:
“I think there are in the industry a lot of products that compete on specs–on what field of view they have, how many depth planes they address. And those things may be important for some types of users but for us it’s about how do we keep the user engaged in the real world. How do we make a pair of smartglasses that you’re actually going to want to use all day? Words like “immersive” aren’t as important. We’re not trying to teleport you away from reality; we’re about making the real world better. We’re about supporting you in what you’re trying to accomplish.”
The challenges facing XR in the next few years
Timoni West, Research Director at Unity Labs
Unity Labs, part of Unity Technologies, develops mixed reality content creation tools.
On multimodal user control:
“Controllers are still name of the game over the next two or three years. It still feels really awkward when people interact with digital objects [using old modalities]. Computers can’t actually read our minds. It’s very exciting to think about transmodality in input methods–combining things like eye tracking, voice recognition, hand gestures, fingerbone tracking–then you’re getting somewhere close to magic. You’re getting closer to that feeling of Harry Potter casting a spell. But even then you’re going to have to do a lot of calibration to make it all work together.
“It’s about creating a new way of interacting with computers. It has to feel natural. Some of the current systems still use a [digital] button-press motif for control. But a button press is still a button press. What we need to see is more body-level stuff. They’re going to have to do the research-level work just to build the foundation for some of these modalities over the next three to five years. We’re going to see a lot of explorations into how these inputs work well.”
Agatha Yu, design lead, Valve
Valve designs VR headsets and software. Yu was formerly Lead Product Designer at Facebook’s VR company Oculus, where she steered the reboot of Oculus’s software before leaving in September 2018.
On mixed-reality skill sets:
“The industry as a whole has to change the way we design, and that has required a shift in skills. I do hiring for the team and I see that we need more of a hybrid of design and engineering talent. Because we are so constrained in our control of the technology we need people who are technically savvy and understand the basic performance issues, and also the user’s needs, in order to make trade-offs.”
On XR development vs. mobile app development:
“In app design, it is very flow driven. You can organize the app into stages. You show them stage one, then stage two, then stage three, and so on. In mixed reality we can’t control the user’s movements, so we have to design more probabilistically. That’s where this dovetails with artificial intelligence because we may have 18 different parameters, from the user’s inputs to the environment, and we have to figure out from that what the user wants to do. Designers have to look to the user’s intent, and the environment is always different.”
Celia Hodent, UX strategy consultant
Hodent worked on the mega-hit Fortnite game while director of UX at Epic Games. She’s the author of the book The Gamer’s Brain: How Neuroscience and UX Can Impact Video Game Design.
On VR becoming more social:
“We are seeing more multiplayer games in VR, but there are some serious challenges with that. Humans are social creatures, we need the connectedness, and people want to share social spaces in VR. But we need to take into account our other needs beyond relatedness; competence and autonomy. These three needs are core to our intrinsic motivation. So not only we need to share a common space in VR to make it more compelling, but how we interact with one another in this space has to be meaningful to express our autonomy (such as to create together) and competence (such as having complementary roles to complete a challenge together), while, of course, being a safe space for all (protected from harassment and other antisocial behaviors).
“If you look at Fortnite it’s not just a game. It’s a social experience where people can meet in a Fortnite space and just hang out. VR is really powerful because you can be completely immersed in a virtual space. I think it’s going to be really explosive. One of the challenges when you’re in a virtual space is that you need to have eye contact with the other people there. And to do that well we need to have eye-tracking sensors inside the headset. You want to be able to see what other people are looking at within a virtual social space. You want to be able to tell when someone is looking us in the eye… We have to overcome these technical challenges to make it a believable experience–it has to not feel uncanny.”
The future of computers is spatial
There’s no question that some serious technical challenges stand in the way of developing a pair of XR glasses that are stylish enough that you’d want to wear them for extended periods in public. The components needed to put enough computing power into a small space on a person’s face—and keep it all cool—just isn’t yet ready for prime time. Tech companies still struggle to deliver graphics at the kind of resolution the human eye is capable of seeing, to extend the field of view to something approaching the eye’s natural field of view, and to create eye tracking that follows the human gaze accurately. From a consumer point of view, the mixed reality experiences I’ve seen so far feel like works in progress.
Plenty of XR content–the experiences–will have to be available, too, including games and other entertainment. And there’s a chicken-and-egg problem: Software developers don’t want to invest big dollars in creating XR experiences until they see that the hardware is selling but that people won’t want the hardware until there are plenty of games and entertainment to play on it.
These issues will be worked out. The days of squinting into a little black rectangle and pecking at apps are probably numbered. Will it be some form of XR glasses that liberate us from that paradigm? Maybe. Or, as Camera IQ’s Allison Wood suggests, XR may show up in a variety of devices.
It may come down to a question of how truly important digital content is to living life productively and enjoyably—and how much control companies give consumers over that content. Will people really want digital layers and holograms to be showing up in their worldview all day? If they’re OK with that in XR glasses then will they be equally comfortable with it in something like AR contact lenses? How about just jacking the digital content right into the optical nerve? How close is too close?
It’ll be consumers that decide these things over time. The people and companies above, and others like them, will have to listen closely to successfully navigate the technological shift over the next decade.