Now that we spend so much of our days on Zoom, I think we can all be adult enough to admit: We’ve all side-chatted, saying one thing to the camera, and another on the side. Maybe it was a joke over Gchat at a coworker’s expense. Maybe it was just multitasking some emails. Maybe it was entering a password into another site.

It’s a relatively innocuous behavior, but it could come back to bite us. Researchers from the University of Texas at San Antonio and the University of Oklahoma have demonstrated something terrifying: They can read what people are typing during video calls on Zoom, Skype, and Google Hangouts with up to 93% accuracy. What are they analyzing to do so? Not your hands, but your shoulders.

“From a high-level perspective, this is a concern, which obviously has been overlooked for a while,” says University of Texas assistant professor of computer science Murtuza Jadliwala, who led the research, examining what could happen if your video meeting were hacked. “And actually, to be really frank, we didn’t start this work for COVID-19. This took a year. . . . But we started realizing in COVID-19, when everything [is in video chat], the importance of such an attack is amplified.”

As Jadliwala explains, the core problem is that our face-to-face video streams are presented in high fidelity, and their pixels convey more information than we realize. Without using any special machine learning or artificial intelligence techniques, Jadliwala’s team figured out how to read the subtle pixel shifts around someone’s shoulders to make out their basic cardinal movements: north, south, east, and west.

Applied to a keyboard, these four directions actually mean a lot. If you are typing “cat,” you start with the C, move west to the A, then back east to the T. Once researchers figured out how to read these directions through shoulder movements, they were able to create software that could cross-reference them with what they call “word profiles” built with an English dictionary, which turned the maze of directions into meaningful words.

The way a hack of this type would work is pretty simple. Anyone with access to your video feed could record it—whether that’s a nefarious stranger who broke into your feed, or someone you know who is part of your meeting. Then they would send that recorded video feed through software, which would analyze when you were typing, and what that typing contains.

In a lab setting, with a certain chair, keyboard, and webcam—while testing a limited pool of words—the average accuracy of the software was 75%. When the team tested subjects working from home in uncontrolled setups (they were asked to visit any websites, write emails, and enter their passwords), accuracy dropped significantly. The team was able to reverse-engineer 66% of the websites visited, but only 21% of random English words, and about 18% of the passwords typed. The reason for this diminished accuracy was that the model makes inferences based on the context of sentences, so it has a tougher time with random words. Passwords, meanwhile, often aren’t in the dictionary at all, so it’s harder for the software to figure them out simply by cross-referencing the English language. Accuracy dropping outside a lab setting were less about lighting or camera quality than some intricacies of the software itself.