Worldwide spending on cybersecurity likely topped $75 billion last year, researchers at Gartner estimated, with companies more wary than ever of the risks posed by data breaches and other digital attacks.
And along with rising costs, the sheer volume of digital security data has also increased dramatically: IBM estimated in a recent study that the average organization sees more than 200,000 pieces of security event data per day and that more than 10,000 security-related research papers are published every year.
“Security researchers are getting hit with a firehose,” says Caleb Barlow, vice president of IBM Security. “Once they get done with today, they’ve got another deluge of data coming tomorrow.”
To help companies handle that flood of data, IBM says it’s training its Watson artificial intelligence platform—previously known for using its natural language processing power to beat humans on Jeopardy—to parse cybersecurity information, from automated network-level threat reports to blog posts from security professionals. According to Barlow, the company hopes to train the system to detect and understand threats to computer systems and to answer questions from human security professionals about incidents they detect on their networks.
“It’s just gonna think just like a forensics investigator,” says Barlow. “Before [security professionals] even dive into an incident, Watson’s done some of the initial [research] and can present them with a thesis on what’s going on and evidence to back that up.”
Watson, which along with winning Jeopardy has been used recently to infer people’s personality traits from their social media posts, will be able to help with tasks like detecting and identifying malware and figuring out how far it’s penetrated into a network, Barlow says. It won’t replace humans—for one, flesh and blood experts will still be needed to decide strategically how to respond to breaches and threats—but the AI platform will ideally help companies faced with a growing cybersecurity skills shortage handle the enormous volume of information generated on a daily basis.
“It’s kind of like radar,” Barlow says. “It’s not going to steer the boat or the plane, but it’s going to certainly tell you what objects you want to avoid.”
First, though, Watson has to be trained to understand the language of cybersecurity. That means having human experts annotate blog posts, vulnerability reports, and scientific papers, essentially diagramming sentences to illustrate to the computer the relationships between security terminology.
“When you bring a new domain of knowledge to Watson, you have to start with, what are the words?” says IBM researcher Charles Palmer in a YouTube video explaining the concept .”What’s important? What’s a virus? A virus is bad. What’s malware? Well, it’s like a virus.”
Tasked with some of that document labeling will be students from eight universities, including the Massachusetts Institute of Technology, New York University, and the University of Maryland, Baltimore County. IBM will help process 15,000 documents per month once the program gears up, with annotation done by students, IBM experts, and others in the field, with an eye toward starting customer trials by the end of the year.
As Watson learns, it has begun to annotate some documents on its own, which are then graded by its human teachers, Barlow says. They can point out mistakes in the AI’s understanding, like one incident when the system thought “ransomware” was the name of a place, not a type of digital attack.
“We had to reannotate a bunch of documents to get it to understand that it’s a form of attack—it’s not a city,” he says.
The collaborations will hopefully not only give students an excuse to closely read security papers but also grant them some experience with how Watson-style cognitive systems work, says Stuart Madnick, a professor of engineering systems at MIT. And, if the project works, it will offer those in the security field a leg up against online criminals, he says.
“There’s a shortage of anywhere from 100,000 to 1 million cybersecurity professionals, depending on which studies you look at,” he says. “Think of Watson as another member of the cybersecurity workforce to join the ranks, but a worker who hopefully can equal hundreds or thousands of other workers.”
The project isn’t the first attempt to aggregate data for better security: Facebook launched a platform last year called ThreatExchange, which more than 100 companies used to contribute structured data about security threats and the links between them in a format similar to the one Facebook uses to track links between users, groups, and events. IBM also launched its own shared threat data system last year called IBM X-Force Exchange, and various industry groups and government agencies—like the National Institute of Standards and Technology’s National Vulnerability Database—maintain their own information sharing and analysis tools to combine intelligence about digital threats.
But there’s still difficulty in merging data from different systems and tools when each has its own format and structure, says Anupam Joshi, director of UMBC’s Center for Cybersecurity, who is working with IBM on the Watson security project and to set up the university’s Accelerated Cognitive Cybersecurity Laboratory, slated to open in the fall.
Researchers at the university have already been working on standardized semantic models of security data. Joshi says he hopes that Watson’s efforts to pull knowledge from existing datasets will make it possible for the security community to gain an increasingly intricate understanding of threats and to flag potential cyberattacks in real time.
“The next step that we want to push on is moving from there to more complex reasoning over this underlying data,” he says.
And although Watson may have gained fame as an adversary to humans on Jeopardy, we’re fortunate to have the AI on our side this time.