In Zoom’s latest update for PC and Mac, the teleconferencing company is making a subtly radical update to its interface. Now, when gesturing a thumbs up or raising your hand in a meeting, computer vision will recognize it and display the matching reaction icon on your colleagues’ screens.
Could the company that took video chats mainstream do the same thing with gesture control? Perhaps. Zoom’s small feature could make gestures an everyday part of our lives—something designers and technologists have been teasing for decades, without much mainstream success.
The appeal of gesture control is obvious. Rather than click buttons or tap tiny screens, you simply move your hands or body naturally as you do during conversations and real-world activities. Ideally, gestures could remove the barriers between you and a computer entirely.
Researchers have been pursuing the idea, dating back to at least 1968, when scientist Douglas Englebart demonstrated “The Mother of All Demos” and unveiled the first computer mouse. Apple then built the mouse into the core UI of the Macintosh computer that debuted 1984, cementing its central role in the computing firmament. A mouse is a gesture system, of sorts, tracking your hand movements in two dimensions on a computer screen—much like a pencil on a piece of paper. Given that computer “desktop” interfaces were built upon the metaphor of a real desk filled with paper and folders, matching them with this flat control scheme made perfect sense.
But people don’t live in two-dimensional worlds, so researchers continued pursuing 3D gesture systems. In 1977, University of Illinois scientists created a glove that could track your finger movements, which inspired a pile of strange controllers for 3D realities in the 1980s and ’90s. Minority Report (2002) stoked these embers, as Tom Cruise waved his hands around like magic to control a large holographic screen.
The most successful gesture-based product of all time followed just four years later: the Nintendo Wii. The console handed you a motion-controlled remote to mime sports like bowling or tennis, and went on to sell over 100 million units worldwide. The Wii’s secret was packaging air gestures as an accessible remote control—an object that looked familiar to anyone who owned a TV—through which Nintendo essentially brought the computer mouse into the third dimension. But its usefulness was limited to gaming. And the Wiimote could really only detect relatively broad motions rather than subtle gestures that could include your fingers.
Microsoft pushed the technological needle forward with Kinect, which used infrared light to actually map your body in 3D space—no remotes required. It was an initial hit, selling 10 million units faster than any electronic in history. But using the Kinect over time revealed its shortcomings: It lagged just a little behind your movements, which could be frustrating. And it revealed issues like, how do you hit a button in midair . . . if you don’t feel a button in midair? Microsoft’s solution was that you would hold your hand in one spot for several seconds, which doused cold water on the promise of natural gesture computing. Personally, I found the task far more laborious than using a mouse.
Ultimately, Kinect demonstrated that, for many gestures, you need physical resistance. Imagine having a sword fight without resistance. You swing at a knight. He blocks. But your arms keep going. The experience doesn’t make any sense. Microsoft discontinued the Kinect in 2017, as it wasn’t the boon to Microsoft’s Xbox business that the company had hoped (though the 3D tracking technology inside lives on, miniaturized, in Microsoft’s Hololens AR headset).
Kinect’s failure didn’t stop other companies from similar efforts. Leap Motion promised to live on your desk to replace your mouse with air gestures (it had the same shortcomings as the Kinect, and while it never caught on as more than a novelty, you can still buy the product today), while Google baked its Soli radar into its Pixel phones to let you air swipe past music tracks (which felt more like swatting at flies than queuing up the next banger for your family). Now, here we are in the mighty year of 2022. We still use mice, and we swipe, tap, and flick at touchscreens. But the promise of gesturing in the air to control a computer seemed all but dead.
And then, Zoom showed up.
Zoom seems to have sidestepped the pitfalls of earlier gesture products, finding the sweet spot for gesture controls with its thumbs up and raised hand. Why is this approach better than what we’ve seen? For one, many of us are already making these gestures in Zoom meetings! For whatever work culture reason, I don’t thumbs up anywhere in my life but on Zoom, where the practice is as natural as it can be—so no one needs to learn the gesture, or even know that Zoom is tracking it, for this to work. Furthermore, neither a thumbs up nor a hand raise requires physical resistance to feel good. We’ve always made these gestures in the air.
But most of all, this trick solves a real problem for Zoom. Unless you are squinting at the full Brady Bunch view of participants on any given call, it’s difficult to see how people react to any question. Zoom is digitizing their reactions in a way that can be highlighted for a host—all while they’re replacing a redundant button press for the user (why thumbs up and hit a thumbs up button? It’s silly).
Zoom took teleconferencing mainstream. Who woulda thunk, they may do the same thing with gesture controls next.