Pete Warden’s startup, Jetpac, isn’t your typical company dealing in the hoary world of Instagram content. Jetpac builds travel guides based around visual analysis of Instagram pictures. The service parses Instagram with an image-recognition platform to tag and categorize pictures of hipsters, dog owners, surfers, local residents, and a million other subcategories.
Now Warden’s tackling a new challenge: teaching smartphones (and cameras in general) how to recognize objects.
Back in April, Warden released a software development kit called DeepBeliefSDK on GitHub. Designed for developers to integrate machine vision into smartphone apps, DeepBelief is currently available for Android, iOS, Linux, and Raspberry Pi. While DeepBelief is just one in a number of early entrants into the somewhat creepy world of deep learning for mobile devices, it has one advantage over competitors. It’s blazing fast—Warden says Deep Belief can identify objects in under 300 milliseconds on an iPhone 5S, while using less than 20 megabytes of memory. Jetpac recently released an iPhone app Spotter, which uses Deep Belief to instantly recognize any object you point an iPhone at.
And like any good techie, Warden tested Deep Belief on cute cats fighting evil raccoons (video below). The video starts slowly, but gets fun as you watch. We promise.
Jetpac’s work is also part of a movement called "deep learning" which offers Google, Facebook, and government intelligence agencies a holy grail of search—being able to search the visuals of images just like we search text and metadata today.
Geoffrey Hinton, who pioneered a revolutionary convolutional neural network approach, now works for Google. Facebook, in turn, hired Yann LeCunn, a NYU computer scientist considered to be one of the world’s top experts in deep learning, for a secretive artificial intelligence project. Warden and Jetpac offer a third approach for deep learning—one where the masses, not just huge search companies, get early cracks at cutting edge image search techniques. It’s also a nifty advertisement for his organization’s capabilities.
"Normally, computer vision starts off by saying you need to find edges, corners, and parts of the image that have a particular texture or color to incorporate that as an algorithm," Warden told Co.Labs. "The deep learning approach instead focuses on giving the neural network, as it learns, millions of examples of the different thing you want it to recognize. It figures out how to look for smaller properties in an image, to find things that look like fur and noses which you would see in pictures of cats. What you end up with is a neural network looking for patches of images that resemble eyes or ears or cat noses, to see what arrangement they're in. If they resemble what they saw in example images of cats, it’s likely a cat."
There are challenges, however. While Deep Belief is great at recognizing some objects, its image recognition component is not perfect. The SDK’s developers found that it mistook sidewalks for crossword puzzles, the binding of spiral notebooks for oboes, and large black trash bags for black swans.
For now, Deep Belief is an exciting novelty. However, Warden is fascinated by the SDK’s future uses. One example he gave was training wildlife cameras in the woods to automatically go off if a certain type of animal wanders by. But more importantly, he pointed to a project called the Catalyst Frame Microscope—one of a number of devices which turn the iPhone into a portable microscope. Using the SDK, Catalyst Frame’s software can be trained to automatically identify different kinds of cells.
While deep learning might be in its infancy today, the future ramifications for fields as diverse as health care, advertising, scientific research, and law enforcement could be huge.