Standing with my eyes closed in the bathroom, I aim my phone in the air. It vibrates, more and more, until it’s buzzing with excitement. “Toilet,” it announces in a female robot voice. “Shower,” it adds a few moments later. My phone is seeing for me, and it’s scary-remarkable. Like the first time Dragon Dictate understood my speech, or Facebook picked my head out of a crowd.
I’m using a free Android app called BlindTool. By Joseph Paul Cohen, a Ph.D Candidate at the University of Massachusetts Boston, BlindTool allows you to simply aim your phone at an object, and the app will try to identify it within a second. It’s like Shazam for the world around you.
“A few years ago I worked with a blind programmer, and it really drew my attention to the needs to visually impaired people,” Cohen says. “I had the idea then to have some sort of technology to see for them, but that technology didn’t exist at that point to be able to do that.”
Today, our computers have become absurdly good at identifying objects. Trained on more than a million images of mundane items like photocopiers and trash cans, our best neural nets–by companies such as Google and Microsoft–can actually name things better than humans can. The catch is that for the most part, these systems require powerful PCs and even servers in the cloud to process the information; they’re simply not practical for calling out the surroundings of someone’s day to day life.
BlindTool, on the other hand, fits on a smartphone and runs as a completely self-contained app. How is this possible? Therein lies the catch. Whereas the most state-of-the-art neural nets are trained on images in as many as 37,000 categories, BlindTool’s logic is built from experience with a mere 1,000 categories of images (which, in fairness to the scale at play here, still represents 150GB of image files).
As Cohen explains, this compromise allows BlindTool to run fast. More complicated neural networks he attempted to port over to mobile required as many as 15 seconds to analyze anything in your phone’s frame. “Even at five seconds, I was upset with the app,” Cohen says. BlindTool requires just a second–which feels more or less instantaneous to use–making the aim-and-listen UX practical.
However, in turn, BlindTool can be wrong. A lot. Walking around my apartment, it called my Christmas tree a feather boa, an ornament a bubble, a door an armoire. Sometimes the results were close, sometimes they were absurdly off. And that’s because the neural net was trained on what Cohen calls an “almost randomly chosen” collection of images–a hodgepodge of opensource work that’s not necessarily catered to the things you or any person would most commonly want identified in their home, commute, or place of work.
“There are a lot of specific things it’s trained on, like dog breeds, but Christmas tree is not on there. So what it has to do is make its best probably guess as to what it was,” Cohen says. “Picture frames might look like microwaves. Microwaves might look like dishwashers… it’s called a coffee cup a bowl of soup, and the same coffee cup makeup powder.”
Cohen built around these inevitable mistakes by designing the app to invisibly indicate its own confidence. BlindTool vibrates as it spots objects that it can identify. On the screen (which most users can’t see, obviously), the app constantly lists what it might be seeing in that moment–the things with .02% probability in the frame. Only when an object hits 30% probability does the app audibly announce what it might be. And only when the app is getting really confident–approaching 90% probability of being correct–does the phone vibrate with full gusto. After just a few minutes of use, you can intuit BlindTool’s own BS, and that’s completely by design.
Even still, again and again, BlindTool was sure my tree was a feather boa. But when I really thought about it, the tree’s pine arms do sort of resemble the fluff of a boa. That round ornament sitting on its branch? It basically is a bubble. My front door? Pretty similar to an armoire. Even BlindTool’s mistakes can be illuminating in their own way.
“If someone just wants to look around, and get the gist of what something looks like so they can add a whole other dimension of sight, experience of the world, maybe [BlindTool is technically] wrong, but it’s right in intuition about something,” Cohen says. “I still think it gives a sense of independence, which is a big goal of doing this.”
Indeed. A similar smartphone app called Be My Eyes enlists sighted volunteers to identify objects for people who can’t see. It’s a great system, but it’s not scalable to the 24/7 needs of everyone with major vision impairment. Meanwhile, BlindTool is a peek at an inevitable future of accessibility, when those of us who need a helping hand only need to reach into our pockets to find one.
[via Prosthetic Knowledge]