From Google’s auto-sorting Inbox, to Microsoft’s unparalleled image recognition, we owe machine learning our gratitude for many of the magical experiences lurking inside software. But how does it actually work? How can you actually train a computer to discern the nuances of data?
Now, the two-person team at R2D3–a self-ascribed “experiment in expressing statistical thinking with interactive design”–has offered an illuminating explanation. They’ve crafted a site that walks you through the creation of a machine learning system through a series of seamless, visual graphs.
They start with a simple enough question–can you determine whether a home is from New York or San Francisco without looking at its address?–and through a series of analyses and good guesses, eventually build a system that can distinguish a NY from SF home with about 90% accuracy. And while it’s a bit of a spoiler, the final image you see really explains it all: A branching tree divides the data again and again, sorting the possibilities by one variable at a time. Each home is categorized by essentially falling through a giant game of Plinko. After passing through variables like its elevation, square footage, and year built, the machine’s guess becomes pretty good.
But it’s possible I overexplained the effect. Try out the visualization for yourself to see what I mean.