There’s a lot going on in the background when you visit most websites. Scripts may be providing analytical data about what you click on. Trackers may link your activity back to your social media accounts. In fact, one type of script allows whoever owns the website you’re perusing to literally watch whatever you’re doing. Called “session replay” scripts, these services record everything you type, where you move your mouse, and more. This isn’t anonymized data collection–it’s very personal. It’s “as if someone is looking over your shoulder,” write the Princeton computer science researchers Steven Englehardt, Gunes Acar, and Arvind Narayanan.
Englehardt, Acar, and Narayanan, who are part of Princeton’s Center for Information Technology Policy, are studying these session replay scripts. These tools are supposed to help web developers and companies understand how users are interacting with their sites, so they can boost engagement and redo “broken or confusing pages.” In short, they’re like a little window into a user’s experience with your site–what one web design firm describes as creepy but useful. While the companies that provide this service claim to give website owners the option to hide their users’ personal information, the three researchers have found that in most cases, the scripts capture it anyway.
“Improving user experience is a critical task for publishers,” the trio writes. “However it shouldn’t come at the expense of user privacy.”
The researchers looked at seven popular session replay companies that offer the service–like Yandex, FullStory, Hotjar, and UserReplay–and found signs of scripts from one of these companies on 482 of the 50,000 largest websites. They found session replay evidence on the websites for HP, Comcast, Intel, Lenovo, Gap, Costco, Autodesk, Microsoft Windows, T-Mobile, Adobe, Nintendo, Crunchbase, Nest, Walgreens, and more (the full list is here). Chances are, you’ve been on one of these sites at some point, and maybe even plugged in your credit card information to buy something.
This isn’t the same thing as general analytics tracking, which is aggregated and anonymous. The research shows that highly personal data like credit card numbers, health information, addresses, and more is likely sitting in third-party servers–and they could even be tied directly to your identity.
The researchers point out that these practices put much of the burden on the website creators, who can painstakingly go through the site manually and ensure that any type of identifying information is redacted from recordings. But this has to be constantly monitored and updated in the website’s back end because its code will change over time–which is expensive and error prone. And any slight modification to the site’s design would require an audit of the entire redaction system.
But highly sensitive information could be floating around out there in the cyber ether, making people vulnerable to identity theft and scams. Part of the problem is that users have no idea which sites are recording their browsers and which aren’t; there’s no visual signal to let them know. Still, there are ways to protect yourself. Your best bet? Browser extensions like Ghostery and NoScript that can prevent session replay scripts from running on your computer.