We’re in a golden age of UX. Why is video chat still stuck in the ’90s?

Today we have cloud storage and augmented reality—all on our phones. So why hasn’t video chat evolved at all?

We’re in a golden age of UX. Why is video chat still stuck in the ’90s?
[Source Image: oleksii arseniuk/iStock]

I got my first glimpse into the “future” of videoconferencing more than 25 years ago, back in the early days of the World Wide Web and hypertext. I still remember looking over a friend’s shoulder as she chatted with a group of friends on CU-SeeMe. Each participant in her chat was represented by a grainy black-and-white webcam in a grid of boxes on the screen. What I saw that day would be easily recognizable to billions of people trapped on Zoomland today. At the time, I was designing CD-ROMs with Quicktime VR tours of Learjets. Today we have cloud storage and augmented reality—all on our phones. So why hasn’t video chat evolved at all?


Hollywood Squares—CU-SeeMe (ca. 1992)

It is true that in the intervening 25 years we have seen massive improvements in the performance and convenience of videoconferencing, but little else. Increasingly, video chat is embedded as a feature in almost every app we use thanks to that ubiquitous video camera button (another symbol of outdated thinking, but that’s another story). This prompts the question as to whether video chat will disappear altogether as a distinct experience we should “care about” and become something we take for granted, like Siri or sharing—something that is embedded in other apps we use on a daily basis. Or is it a modality that is still waiting to be fully realized?

With billions of people struggling to transpose their work and social lives into this virtual channel, I would argue that it is time to raise our expectations and ask different questions regarding what this modality can and should be. Given the state of the pandemic, there is no reason to believe that we will not be working remotely for many, many years to come. The powers that be at Zoom could give us reason to believe that, in the not-too-distant future, we might enjoy a much more satisfying, user-friendly, and delightful experience from the hours we spend on video chat. More importantly, this is a moment when we desperately need online environments that encourage dialogue between people from different backgrounds and perspectives—that are inclusive of more diverse forms of participation than “Hollywood Squares” allows. The current Zoom environment only sets us back from this goal by reinforcing an interaction model that has been rightly criticized for reflecting and perpetuating the worst instincts of professional meeting culture, such as power-hoarding and individualism.

CU-SeeMe [Image: Wiki Commons]

How did we get here?

For most people it may seem that videoconferencing appeared in an instant. Crashing like a wave across every aspect of their personal and professional lives. What was once on the periphery—as a poor excuse for a meeting—is now the primary focus of our emotional lives. Yet, where is the innovation? As illustrated by the CU-SeeMe example, Zoom is riding the rails of a long history of user experience design. If we look back, we can also see the roots of a number of different false starts and neglected offshoots of our online lives that might have set in motion a more interesting and satisfying trajectory for videoconferencing. People have been tinkering with video chat, social workspaces, and virtual reality for quite some time. How might we look to these roots (not just the obvious precedents like CU-SeeMe) for the seeds of a new and improved user experience?

Second Life [Image: courtesy Linden Lab]

Virtual reality—second life (ca. 2003)

First, let’s get this out of the way: In the future, we will not be embedding our virtual online meetings in a fully, simulated 3D environment. I started studying 3D environments as a design fellow at Microsoft Research in the mid-’90s as part of the graphical chat team led by bot-guru Lili Cheng. At the time, many people were predicting that the web would go 3D thanks to VRML and other technologies. The promise of 3D meeting spaces has come in waves ever since, with the most visible failure being Second Life in which a number of companies (and organizations like TED) actually held virtual meetings and conferences with participants strangely floating in and out. Imagine trying to hold a meeting in an endless conference room, with no walls or ceiling. With no ability to make direct eye contact.

Dialogue—Comic Chat (ca. 1996)

While our team at Microsoft Research never launched a successful 3D chat environment, we did sneak out a clever product called Comic Chat that flew under the radar despite the fact that it was a fully functional IRC client—basically any standard online chat could be converted to Comic Chat. What Comic Chat sacrificed in visual fidelity it made up for in the creative use of dynamic framing mechanisms that blended time and space to represent the flow of a conversation as we might actually experience it (we had a full-time animator on staff). This effect was largely created through the smart use of camera controls (a concept pulled from the world of video) that would hold each frame of the comic strip for a short window of time to see how many participants would actively join that moment in the discussion.

Microsoft Comic Chat [Image: Flickr user Aaron Parecki]
Some frames would end up with a single figure in speaking, but others would expand to include three, four, or five people if they all responded quickly enough. The rendering engine would capture each moment before moving on to the next frame in the discussion. Add in a pretty sophisticated facial expression “wheel” (in the lower right above) and you had a surprisingly satisfying experience as well as a surprisingly engaging record of the conversation—something that Zoom has completely neglected.


In Zoomland, we do not experience dialogue of this sort, rather the stumbling awkwardness of a daisy chain of monologues and soliloquies. Yet there are clearly moments in any meeting when two to three people are actively engaging with each other, or trying to, even though they can’t share the audio stream simultaneously. Specifically, they are responding quickly enough—or attempting to and stepping over each other—for the other participants (and a platform like Zoom) to understand that an active dialogue is happening on a given topic. It is not hard to imagine the Zoom window being able to rearrange itself dynamically (in the manner of Comic Chat) so that this moment is appropriately framed with the active participants combined against some sort of blurred background (no tropical beaches, please!) or simple, angled 3D effects, of the sort that Argodesign explored with the virtual windows in the design concept Square.

[Image: Argodesign]
This sort of dynamic framing technique would provide more than just a cheap visual effect. One of the most monotonous things about Zoom meetings is that we have no spatial memory for how the meeting progressed, which likely hinders our ability to retain information as well as share it with others? Our sense of recall is impoverished by the fact that the meeting pretty much looked the same the entire time. Zoomland is static. Viewed through the lens of the sort of dynamic framing employed by applications like Comic Chat and you might more easily remember the flow of the meeting through the anchoring moments or discussions when certain combinations of participants were actively engaged with each other, and appropriately framed by the Zoom window as such.

Conviviality—The Palace (ca. 1995)

But what about everyone else? Dialogue is great, but it can quickly become overwhelmed by a select few—the loudest voices in the room. Like everyone, I hate it when presenters ignore or fail to read the room in a meeting IRL. This has only gotten worse on Zoomland. In Zoomlands, the rest of us are invisible if we are not speaking, or if you don’t open the chat sidebar. So how might we cultivate a greater sense of conviviality?


The Palace [Image: Fernanda B. Viégas and Judith S. Donath/MIT Media Lab]
If you ask Zoom, adding emoticons is a huge step forward in our ability to express ourselves without interrupting each other. Emoticons date back to the early ’70s (at least in ASCII). While they may be relatively new arrivals to our phone keyboards and, more recently, the world of videoconferencing, there was a moment, back in the 1990s, when they were all the rage in virtual chat, thanks to The Palace. The Palace was a graphical chat environment in which participants (represented by emoji-like avatars) could create and explore an unlimited set of themed, virtual rooms to meet and chat about . . . well, anything (loosely based on the concept of memory palaces). While The Palace was messy, it also felt freer and more open than the rigid boxes we inhabit in Zoomland today. Conversations would pop up organically and simultaneously with people drifting within and between different “rooms.” There was even a version of this sort of social chat that would follow you wherever you went on the web. You would land on a website and find other people “there” to chat with using an early web browser extension.

I know what you’re thinking: That is one of the ugliest screen designs you have ever seen. As a longtime UX designer, I can fully appreciate the desire to clean up the messy world of social interactions with a cleaner and more orderly environment (like some sort of virtual Glass House). But conversations are dynamic, playful, and engaging (at least the memorable ones). Why can’t we have the freedom to migrate around in Zoomland and arrange ourselves as we please while listening to a discussion or viewing some shared slides? While this behavior would be extremely distracting in a physical meeting, it is just one of many ways that meetings in Zoomland improve upon our experience.

Philip Johnson’s Glass House. [Photo: Flickr user Tom Hart]
What about “waiting rooms”? Think about the millions and millions of people everyday waiting for “the host to join.” What if we could create a custom waiting room, replete with virtual merchandise such as custom wallpapers, games, or special effects (think PhotoBooth) for our friends and colleagues to hang out in before (and after meetings)? I suspect that these conversations would be more engaging and meaningful than the actual meeting. Isn’t it true that discussions in the hallway or over lunch are often much more interesting than the actual meeting session. Back in the day, the frustrating thing about The Palace was that you never knew if anyone would actually be in one of these rooms, at least anyone you cared about. But today, with all of us living online 24/7, this would be less of an issue. Just like Dopplr, this could open up the space for meaningful and unexpected encounters in Zoomland.


Dopplr [Image: Flickr user Joi Ito]
Also, who decided that a scrolling, vertical feed was the right way to represent the comments and input from the broader group of participants in Zoomland? What if our questions hung in the air around our heads (like thought bubbles) until we felt that they were sufficiently addressed? What if the other participants could migrate closer to us to show that they too, want to discuss that issue or topic? Perhaps the speech bubble could grow if others “join” and begin to crowd into the video window of the person who is leading the discussion until they respond? Just like camera controls, persistence is a concept that digital designers have been exploring for decades, yet Zoom doesn’t seem to have a clue.

Total Recall—Google Wave (ca. 2009)

We tend to focus our frustrations with Zoom on the immediate, in-meeting experience. Maybe this is because it is such a relief to be finished with a full day of videoconferencing. As mentioned earlier, we forget how these environments impoverish our sense of recall and thus our ability to build and retain knowledge in Zoomland. The ideas described above are meant to point the way to capture and represent the flow of a meeting as well as key moments of interest (MOIs). You may be familiar with the term POIs (points of interest). It is one of the anchoring design concepts behind the various mapping apps that we rely on every day to build our understanding of the physical world. MOI is the analog in Zoomland (I developed a graphical note-taking style based on this concept).

[Image: courtesy of the author]
Step back and consider the volume of video content being delivered, and recorded, in Zoomland every day with roughly 300 million daily users. Then ask yourself whether you have ever gone back to watch a Zoom recording? Let’s imagine you joined a Zoom meeting halfway through and want to quickly “catch up” on what you missed during the meeting? Or, you were a bit distracted by your 12-year-old’s nervous breakdown and want to go back after the meeting to review? There is no efficient or meaningful way to “play back” a Zoom meeting.


I recognize that this issue of recall is not exclusive to Zoom. A huge portion of our knowledge base has migrated into digital video, one of the hardest formats to parse effectively for MOIs. This is also not a new design challenge. When I was at Frog, I led a team that worked for one of the cable companies in the early 2000s, and we created some IP around the concept of “clipmarking”—using your remote control to create social bookmarks in a media stream that you could share with friends, like a highlight reel on Sportscenter. The concept never went into production, but others have since launched platforms around the concept. When you raise your hand in a Zoom meeting today, or more importantly launch a thought bubble of the sort I described above, that “event” should be marked in the timeline as an MOI. A timeline that could easily represent who was speaking over the course of the meeting and when different media was shared. I would love to see Zoom implement a rich visualization along the bottom of the UI that employed a sophisticated array of sparkline techniques in a single “ribbon” to represent the meeting flow in real time. If you joined late, you could immediately see if your boss had already presented, for example.

Google Wave, ca. 2009. [Image: Flickr user Bruce Clay]
Back in 2009 Google launched Wave, a much-hyped product that blended a collaborative document space with social chat. It had a nifty playback feature that allowed any participant to quickly catch up to any changes that had been made to the document, and by whom. Each change could be accompanied by a message (not in the form of a thought bubble, but you get the idea). Anyway, Wave was chockfull of interesting UX paradigms—and a total bust as most users found the interface to be much too complicated given the number of different panels and features (all within early web application technology). It is really too bad that this sort of market failure often wipes out all the associated design concepts along with it, despite their potential value. Like food poisoning.

Could Zoom integrate this sort of playback concept?

Right now the effort to incorporate a playback feature (or 30-second rewind à la Netflix) would be pretty meaningless. Roll the camera, and all you would see are boxes shifting position and taking center stage briefly. But factor in some of the concepts I have described above, and you might have something interesting and meaningful when played back at 5x speed. You may not be aware but Zoom also offers a “closed captioning” function. Try it sometime as it is quite good at transcribing the meeting in real time, like a stenographer. This transcript could be auto-generated and searchable, to further assist our ability to extract relevant information from past Zoom meetings and jump to MOIs that matter the most to us. Imagine scrolling through a series of screenshots, not much different from Apple Time Machine, each one corresponding to a key frame from the discussion with a short caption underneath. Or, Zoom could offer a premium service for important meetings in which these MOIs could be pre-compiled into a highlight reel with some canned, Ken Burns-style effects, of the sort that our digital video algorithms mastered a long time ago. Voilà, from blah blah boring three-hour meeting to . . . 100-minute documentary film!!!


I know that I am making this all sound way too easy, when none of it is. Most of these ideas are half-baked, having emerged initially from a COVID-19-inspired nature walk with Dan O’Sullivan, the department chair for New York University’s ITP. I know enough to know that many, if not all, would struggle to come to life under the disciplined hand of a strong UX design team. After all, the examples I have cited here have long since disappeared to the UX fossil graveyard, while Zoom is riding strong. But wouldn’t it be great to see something daring and bold come out of Zoomland at this moment to give us hope?

Right about now, the world could use some striking design ideas to inspire us during the long days of remote meetings ahead. While the ideas I have shared may seem a bit indulgent, we are living in an urgent time of change when the need for dialogue could not be greater. Imagine if, like Dopplr, Zoom would send each of us a weekly or monthly personal report indicating how much time we spent dominating the discussion versus listening in general, as well as in proportion to other colleagues.

Others have written about how professional meeting culture (“own the agenda”) impedes our ability to embrace uncertainty and create space for diverse perspectives. The fundamental UX paradigm of Zoom, and its competitors like Skype and BlueJeans, is only holding us back from realizing these aspirations. In my line of work, we frequently host calls with a diverse set of participants, from large foundations, to governments, to community leaders. It’s about time we had a virtual environment that is conducive to these conversations.


Robert Fabricant has been working at the forefront of user-friendly design for more than 25 years for organizations such as Microsoft, UNICEF, and Frog. He is the cofounder of Dalberg Design, a unique practice focused on social impact with design teams in Dakar, London, Mumbai, Nairobi, and New York, and a finalist for Fast Company’s World-Changing Company of the Year in 2019. User Friendly: How the Hidden Rules of Design Shape the Way We Live, Work, and Play by Cliff Kuang with Robert Fabricant (FSG) was released in November. You can follow him on Twitter.


About the author

Robert Fabricant has been working at the forefront of user-friendly design for more than 25 years for organizations like Microsoft and Frog. He is the cofounder of Dalberg Design, a unique practice focused on social impact with design teams in London, Mumbai, Nairobi, and New York, and a finalist for Fast Company’s World-Changing Company of the Year