CSI: Email—Unmasking Anonymous Messengers

A new algorithm identifies the unique writing style hidden in digital communications.

fingerprint

The world of anonymous emails and comments could soon come to a crashing end. Researchers at Concordia University have discovered a way to mathematically uncover the unique (and often sub-conscious) writing style, or "write print," of each individual. The most immediate application will help law enforcement identify the author of anonymous emails from a line of suspects. As of now, the program is roughly 85% accurate and confined to email sniffing, but it's conceivable that the technology could eventually unearth the identities of spammers, trolls, or even terrorists.

"In the past few years, we've seen an alarming increase in the number of cybercrimes involving anonymous emails," professor of Information Systems Engineering, Benjamin Fung, tells ScienceDaily. Through email, pedophiles prey on children, bullies harass classmates, and criminals relay information--all under the secretive cover of digital communication. Traditional detective work may uncover the general location of a group of suspects, either by tracking down a computer's IP address or looking at contextual clues, but the actual author may still allude authorities.

For normal statistical modeling, which often involve hundreds of thousands of individuals, a handful of suspects is paltry data set. So, researchers decided to dig deep within the human psyche and uncover all the hidden processes involved in piecing together a sentence: vocabulary richness, punctuation patterns, use of spaces--even the frequency of certain letters. The thousands of micro-decisions we make in every sentence is a smorgasbord of data for scientists who must find the unique set of overlapping elements to red-flag a particular personality.

The proving ground for the team's sleuth algorithm was 200,000 real-life emails from 150 Enron employees. From a small sample of 10 subjects and 100 emails, the technique correctly identified between 80 to 90% of subjects. Thus, it's not accurate enough for a court of law (because 20% of subjects would be falsely accused), but it is enormously beneficial to resource-strapped detectives.

Even more significant, email communication is not unlike comments or text messages--all have their own common grammar and letter patterns. Indeed, a more refined algorithm might be able to use a search engine to mine data from the nasty political commenters or forum trolls who spam dozens of websites a day.

We might even see an algorithm that could detect such comments as they are written and permanently ban the user, based on data collected from around the web. Intelligence services could likewise use this algorithm to comb through recorded digital communications or bomb threats and sniff out individual terrorists.

The usefulness of "write-prints" is still left to our imaginations. But, as with all aspects of the Internet, the zone of privacy is becoming smaller every day.

Follow Fast Company on Twitter. Also, follow Greg Ferenstein on Twitter or email him

[Image: Flickr User Williac]

Add New Comment

0 Comments