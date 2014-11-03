Smart as they are, computers are still as blind as a bat. It’s why search engines index the web using text and why you still have to fill out those annoying captchas. But with advances in machine learning and image recognition, computer vision is slowly getting to the point where it will be useful to us.

Flickr flexed its computer vision muscles recently with the launch of Park or Bird? The one-page web app was built in response to an XKDC comic poking fun at the limitations of computers when it comes to understanding the content of images. It allows people to upload a photo and automatically determine if the image was taken in a national park (using location meta data) or of a bird (using Flickr’s computer vision).

The hack itself wasn’t anything more than a fun response to an Internet comic, but it offered a taste of some impressive technology that Flickr is working on internally. And it’s not just R&D: Computer vision has found its way into Flickr’s product roadmap and will be something all of us will soon be exposed to more, whether we use Flickr or not.

Flickr’s image recognition technology uses a type of neural network called deep convolutional neural networks. Google is also investing in this type of deep learning technique, and has acquired at least two companies companies that specialize in this technique (Jetpac and DNNResearch) in order to improve the image recognition capabilities of its photo app.

“These methods have evolved rapidly over the past few years, thanks to some key algorithmic improvements and the availability of more powerful computing infrastructures,” says Simon Osindero, an AI architect in the Flickr Vision and Machine Learning group at the Yahoo-owned company. “They currently work well particularly for object, scene, and attribute recognition in photos.”

Having parsed millions of images, Flickr’s deep learning algorithm has learned to recognize 1,000 different objects in images. It does this by passing them through a series of layers, each of which transforms the original image and performs progressively more and more complex computations on it.





As the team explained in a blog post: