A cybersecurity startup is applying the same “deep learning” techniques that are used in modern image and voice recognition to detect malware.
Deep Instinct launched late last year with a system that it says can go beyond typical antivirus programs by not only detecting known malware but also flagging dangerous software it’s never encountered before.
The company doesn’t need to have security experts create digital rules specifying what kind of characteristics should trigger alerts, says Maya Schirmann, Deep Instinct’s chief marketing officer. Instead, the system essentially trains itself by studying enormous numbers of applications, documents, images, and other common types of files, labeled simply for whether they contain malware or not.
“The training phase happens on our own premises, at Deep Instinct,” Schirmann says. “We have an artificial brain, and we train it on hundreds of millions of files.”
The system develops a complex statistical model for what constitutes malware, which Deep Instinct claims has at least a 99.9% success rate at spotting infectious files. It can continue to refine that model by allowing the program to sample additional malicious code as it’s discovered by security researchers and distributing those updates to existing customers. Engineers at the company, which has offices in San Francisco and Tel Aviv, can even tweak samples of existing malware to create new attack code, which can serve as additional training data for the program, she says.
“We have a special team dedicated to creating these malware [programs and] mutating them,” she says.
The company’s software can then run on a networked server, scanning email attachments, file uploads, or other incoming data for potential attacks. Or it can run in a standalone mode on a desktop, laptop, or smartphone. Since the learned model is stored with each copy of the software, it can even detect attacks when it’s not connected to the internet, like if an infected memory stick is inserted into a computer while traveling on an airplane, she says.
“It doesn’t need to be connected to the enterprise’s network to detect and protect,” she says. “It doesn’t need to be connected to the internet at all.”
Deep Instinct isn’t the only company to apply artificial intelligence and machine learning techniques to security, but Schirmann says it’s the first to launch a security product based around deep learning—a technique that’s helped internet giants like Google and Facebook make significant strides in traditionally difficult areas of computing, such as facial recognition and language processing.
“Deep learning is a domain that today is applied by companies like Google, Facebook, Baidu, Microsoft, [and] IBM, with amazing success in domains that have nothing to do with cybersecurity,” Schirmann says.
Unlike earlier forms of machine learning that often required humans to specify types of features of an image, audio sample, or other input they thought would be important to classifying the data, deep learning generally starts with the actual byte-by-byte structure of the input file, where systems automatically learn to run the data through multiple layers of mathematical processes to tease out higher and higher-level structures.
That method, it turns out, gives better results for many different recognition and data classification problems—and can save on human labor, since there’s no need for living, breathing experts to code up ways to extract salient features from the underlying information.
“In deep learning, that phase absolutely does not exist,” says Schirmann. “You feed in raw data.”
And while the current version of the company’s product only targets malicious files and not other types of attacks, Schirmann says future versions of the tool will apply the same learning techniques to other attack types as well.