In the opening plenary at the International Conference on Machine Learning, Fei-Fei Li, an associate professor of computer science at Stanford University, expounded on image recognition technology advancements since the 1960s. The key takeaway? It’s still not as good as what the human brain can do. Li herself has contributed these improvements through her work on an image recognition model called dense captioning—software designed to assign descriptions to individual elements of photos. Take a street scene for example. The dense captioning model can call out a parked bus, a blue sign, a green traffic light, or perhaps a pink jacket.
The technology is a major step towards building software that can understand photos contextually—the way humans do. And yet, this advanced model shows us that machines are still very far behind human comprehension. In order to really think like humans, machines need to understand implied meaning like humor, sadness, joy—elements that are hard to define in programming language. The next big hurdle for machine learning, says Li, will not just be determining what is in images, but what they mean.