In one of Silicon Valley‘s best jokes of this season, Jin Yang promises to deliver an app that can identify any food a smartphone camera sees. Instead, he falls a bit short with the demo, simply categorizing foods as “hot dog” and “not hot dog.”
It’s a case where truth is as strange as fiction. In fact, even companies like Jawbone have battled with easy ways to categorize and track the foods people eat. If only a smartphone could identify any food on your plate, perfectly, it could track calories, spot food allergies, and even just make it easier to tell which type of sashimi you’re eating.
Now, researchers at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) joke that they have made their own promising step to building something better than Not Hotdog. They’ve created a neural net called Pic2Recipe that can look at your plate, and identify what you’re eating–including all the ingredients inside the dish–just over 20% of the time. Half the time, the neural net can match the dish with its top five list of guesses.
“We still have a long way to go before we can truly tout the accuracy of this system at a level with human performance, but to the best of my knowledge, this is the most accurate recipe-prediction system to date,” says MIT CSAIL graduate student Nick Hynes. “Even when it fails, the matches usually are somewhat reasonable.”
Perfect? Far from it, but CSAIL’s accomplishment is impressive all the same. The team built the system from what they believe to be the largest food data set of all time. To train their machine to identify dishes and the ingredients in them, they ripped recipes from popular websites like Food.com and AllRecipes.com. From all of this information, they built a dataset with over a million photos and recipes inside, which they estimate to be 10 times bigger than any recipe set before it. When training a neural net, data set size matters. The database offered a brute force advantage to MIT’s system, the equivalent of studying harder for a test by reading more books than your competition.
The system still has problems with certain, specific types of foods. “[It] still has a lot of difficulty with blended and mixed foods like smoothies, soups, and sushi rolls,” says Hynes. And it’s not so surprising as to why. There’s virtually no way to tell what’s inside a soup or smoothie without seeing the ingredients go in the blender. And sushi rolls are mostly a mess of seaweed and rice, in which virtually anything could be hiding.
But Hynes is still bullish on the future of Pic2Recipe, which can be improved further by adding more images to the source data set, and along with additional data on metrics like volume and healthiness–all of which can aggregate into more accurate results. “I think this has the most potential as a ‘calorie counter’ that people could use to analyze their meals and determine nutritional value…[and it] would be particularly useful in restaurants and cafes when you’re not sure exactly what ingredients are in what you’re eating,” says Hynes. “I could even picture being able to extend this beyond recipe recognition to recipe manipulation–imagine being able to change an existing recipe to be healthier or to conform to certain dietary restrictions (allergies, diabetes, etc.) based on other recipes in the database that you want to incorporate into a specific dish.”
Until then, people still do a better job than computers at identifying the food on a plate. And when in doubt, never be too embarrassed to ask your waiter.