“I was up until 1:30 working on it!”
“That’s not that late,” someone quips, and the room chuckles.
I’m at a weekly meeting with Pinterest’s computer vision team in San Francisco, where a group of young researchers sits around an oversized boardroom table. It’s a spitting image for a college lab–most of the researchers are under the age of 30 and have a penchant for looking down at the table while speaking.
They’re reviewing their work in a weekly meeting. And yes, someone was up late. Because Pinterest is a startup. It’s out-staffed by larger tech companies. And occasionally, the boss wants things done. Now.
That boss is engineering lead Li Fan, one of the foremost experts on visual computation who hails from Baidu and Google. And the thing she wants done now happens to be the computing paradigm of the future: In a world where everyone has a camera in her pocket, many experts believe that visual search–taking photos instead of searching via text queries–will become the de facto way we look up information.
All the major tech companies want to dominate this paradigm. That’s why Microsoft has 8,000 generalized AI researchers, who were the first to make computers identify objects better than humans can. And Facebook has 300 of its own, who deploy 1.2 million visual AI experiments on Facebook at any given moment. Amazon won’t get into much detail, but it has 5,000 employees working on Alexa and its new seeing, outfit-judging Echo Show. Then there’s Google, which has hundreds of researchers working on visual machine perception for platforms like Google Lens, Google Photos, and of course, Google Search.
Up against the biggest companies in tech, Pinterest is bringing 12 visual AI researchers to the fight. Twelve. And in this arena, Pinterest doesn’t just think it can compete. It believes it can win.
“Five to 10 years from now, using the camera as an eye to search and discover things will probably be way more common than typing in a text search. I really believe that,” says Pinterest cofounder Evan Sharp. “We have an advantage right now, and we’re trying to race as fast as we can to build something useful on top of what we have.” Even if Evan admits that Pinterest’s own technology is still very much a work in progress.
What it lacks in scale, Pinterest makes up for with a war chest all its own. The company is sitting on what might be the cleanest, biggest data set in the world to train computers to see images–the equivalent of a small nation hiding a nuclear armament. That’s billions of photos of furniture, food, and clothing, that have been hand-labeled by Pinterest’s own users for years.
Pinterest’s other advantage is that it’s not really a search service. It’s a startup that’s free from the burden of Google or Amazon–both of which must provide perfect results to search queries as a matter of business. At Pinterest, users come to casually window shop a better life, starting with remarkably unspecific queries like “dinner ideas” or “fashion” they often might search again and again, week after week. As a result of both this behavior and the site’s gridded layout of photo pins, Pinterest can build visual search into its platform, not to offer one perfect answer, but an imperfect collection of inspiration.
No one understands that better than Li Fan, Pinterest’s head of engineering whose previous job was leading Google Image Search.
When Li Fan was age 6 or 7, she asked her parents if she could take an art class. So twice a week, for two hours after school, Fan received classical training in sketching, water colors, acrylic, and oil painting. Even at such a young age, Fan appreciated the singular focus of her work, with the hours melting away as she constructed portraits and landscapes on canvas. It helped that she was talented. Several times, her pieces were accepted to the children’s art exhibition of Shanghai. Once, a Japanese collector purchased one of her paintings for what then, at least, felt like a large sum of money.
But by age 12, Fan’s parents would force her to quit painting to focus on academics. Computer science became her new self expression, and a few decades later, after stints at Cisco and Google, she would lead the 1,000 engineers working just on search at Baidu. She couldn’t shake her love for the visual arts, however. “After [Baidu], I figured out what I wanted to do personally,” says Fan. “It’s one thing to advance your career. Your true passion is another thing. You have to find the balance.”
So Fan left Baidu for a smaller role back at Google that would please her 6-year-old self, leading Google Images. She took up painting again, too. Then Pinterest came knocking in 2016. The company’s mission was attractive. Pinterest, to the believers, is a place people come to quite literally picture a better life for themselves, be that in the form of a cozier living room, adventurous trip, healthy dinner, clever Halloween costume, or creative tattoo. An equally attractive offering was Pinterest’s data set–an endless palette of pigments for Fan to paint the future of visual search.
Back at the meeting with Fan’s researchers, they take turns around the table, showing off the latest tricks they’ve taught Pinterest’s ever-evolving AI.
Young and lean though the group may be, they know how to ship products, rather than merely research theories. Fan’s 12 visual engineers have currently trained Lens to spot 3,000 very specific categories of items like “waffle knit,” “acai bowl,” “Brooklyn Bridge,” “latex,” and “watercolor tattoo”–along with “kilim pillow” and “club chair.” That’s four times as many categories than when Lens launched earlier this year. Such precision is born from a starting point of the seven years of user-made tags, which categorized images which are scanned by AI to learn and find visual commonalities.
One team leader, Andrew Zhai, created Pinterest’s real-time visual search system, which now scans through billions of photos in milliseconds. The other leader, Dmitry Kislyuk, had an ingenious notion in 2014 to allow Pinterest users to search within an image by drawing a simple bounding box around something like a lamp in a living room. It didn’t work all that well at first. In fact, Kislyuk confesses to me, with a sly smile, “It was a pretty successful demo, but what we didn’t tell most of the employees was, we spent hours and hours the night before just trying to come up with good examples.” Crucially, however, these searches helped Pinterest collect an even more granular understanding of its own data set–to learn not just that a bike was sitting somewhere in a photo, but spotting exactly where it sat in all the visual noise. Even Pinterest Lens–the app’s current pièce de résistance of visual search–had a front end that was built in just a few days by software engineer Kelei Xu.
Today, software engineer Eric Kim presents his latest updates on Lens Your Look. The chummy, chatty flow of Pinterest’s UX fits hand-in-hand with Pinterest’s visual search through this new feature, which lets people begin a text search in the app, but then, quietly, nudge it for better results with photos from their own camera. It’s a way for people to say, “No, no, I just want something more like this thing.” Ironically, Pinterest can’t understand a word you speak–like Cortana or Alexa can–but through this conversational use of photos, it offers a way to articulate the inarticulable.
Lens Your Look, which just launched on Pinterest’s iOS and Android apps last week, allows you to search for fashion by normal text queries (like “black dresses”), but then, use Lens to photograph something like your own set of heels. By combining text and visual searches, Lens Your Look will use visual AI to find people wearing black dresses with heels of a similar style–maybe even the exact same brand and cut–to your own.
“It’s fascinating because the information there is so rich and subjective. I know . . . there’s the phrase, ‘a picture is worth a thousand words,'” Fan says. “It’s actually two-sided, that a picture covers so much information that you can’t describe it comprehensively in words. And, that picture can describe yourself so much more clearly than a few text words.”
Jean jackets are the topic for today, as Kim has just trained the system to understand floral, plaid, and denim fabrics with greater reliability. The results are extraordinary, as Kim runs through a Powerpoint full of models donning blue denim jacketed fall outfits. Anyone can see that the suggestions are good with their own eyes, but the quality is reinforced by Pinterest’s front end. Any time a user taps on a particular My Look pin, Pinterest learns more–that this one result was essentially right, or the most right, in the larger pile. Then, Pinterest will learn to prioritize that result higher in the feed next time.
Over a month after watching it in the research lab, I was able to take Lens Your Look for a spin myself. In this early, public iteration, the magic I’d witnessed behind closed doors isn’t quite there. I tried searching “men’s fashion,” while photographing a light blue graphic tee I can never seem to dress up beyond grown third-grade boy status. All I got were more graphic tees that I could browse or buy, though many of them were also blue! For another test, I tried “men’s fashion” again, but added a photo of an Italian bag I picked up in Florence’s leather district. I always feel like I’m trying too hard when I wear it, but I figured the options of Pinterest would offer some ideas. Again, it was a strikeout. Mostly, I saw Trunk Club style photos with folded shirts, leather shoes, and belts–no bags at all! I attempted the same photo search in “women’s fashion,” thinking Pinterest might have more purse-style material to pull from. The results were a bit improved in terms of spotting model-donned outfits with leather bags, but quite honestly, nothing felt all that close to my bag.
Finally, I tried to search “recipes,” adding a photo of an avocado (read more on the significance of avocados in our Pinterest feature here), to see what might happen. But in fact, Pinterest doesn’t offer the option to add image searches to all categories. The little camera icon simply isn’t there beside the search bar. It’s a clever bit of UI, ensuring that Pinterest’s visual search AI is only deployed in the spots where the company has expertise. And while Pinterest wants to one day rule visual search, no one at the company claims either the technology or the implementation is there yet.
“We’re super early. It’s like we’re at where text search was in the mid ’90s,” says Sharp. “There’s this technology, it’s interesting, but no one’s really quite dug down deep enough to know what the product is or what problems it’s going to solve.” Indeed, the history of technology is always about making naive bets on precise use cases. Who could have known that a touchscreen iPhone would change the world overnight, while Google’s Glass computer in your eye would become a punchline just as fast? Visual search is just the latest example of this trend. And Sharp is right. It will absolutely impact our lives. But as always, the key will be to figuring out how, first.