There’s a joke about a quality control guy in a match factory. He picks up every match, strikes it, says, “This one works,” blows it out and puts it in a box. The joke is funny because instinctively we know that he doesn’t need to see every detail to understand the big picture. This instinct seems to fail people when it comes to Big Data. As a reader of Co.Design, you’re probably the kind of person who has to turn insights into things, whether it’s communications, products or entertainment. As that insight increasingly comes from data, it’s a failure that affects us all.
Take the Cambridge Analytica affair. It’s uncovered a lot of shortcomings in governments and businesses. But there’s a deeper scandal here: the innumeracy of everybody concerned. Why the heck would you need data on 85 million people to communicate to them?
The answer is that you don’t. Gordon Ramsey doesn’t need to eat the whole risotto you cooked before he knows whether to call you an incompetent arsemuppet. He just needs a forkful. In the same way, you can find out everything you need to know about your customers from looking at a small forkful of them. It’s just as valid, far cheaper, and easy to do in an ethical and legal way.
Even the companies built on Big Data know this. In Everybody Lies, ex-Googler Seth Stephens-Davidowitz admits that, “At Google, major decisions are based on only a tiny sampling of their data. You don’t always need a ton of data to find the right insights. You need the right data.”
Decades of research has shown that you can get a pretty accurate picture of the US population from a sample of under 3000 people, if you choose them carefully. If you really, really want to drop your margin of error as low as possible, you look at about 9000. After that, there’s just not a lot of gains to be had. U.S. pollsters looked at the intentions of hundreds of thousands of Americans and still called the 2016 election wrong. Politicians’ careers depend on accuracy. Creative people’s careers depend on insights, and that’s where small data rules.
After all, somebody has to turn that data into a thing, whether it’s a communication or a product. As I noted earlier, dear reader, that somebody might just be you. Sure, a Cambridge Analytica client could create 85 million customer profiles and hire copywriters to craft ads specifically for them. If each ad costs $10 a pop to create (hardly hiring top-flight freelancers here, but y’know, exposure) you’re not going to get much change from a creative budget of $1 billion, putting it beyond the reach of P&G or even The Illuminati. Instead, you’re going to have to create a few ads that are each seen by a bunch of people. The bigger those bunches, the more your creative budget stretches.
To solve this problem, businesses create clusters of audiences and hit them with a few different messages and a range of products. Take Apple. It divides people into value-conscious, early adopters, and show-offs, then makes three phones. Don’t fit into one of those categories? Fine, there’s a Samsung store at the end of the street.
Big Data promised to change that. Each one of us would become a special little snowflake cluster, all on our own. It’d tailor messages and recommendations specifically to each one of us and enable companies to create products that fit us like a Savile Row suit.
I have sat in meetings in just about every company that’s supposedly built on vast troves of information and heard how little they actually understand their customer behaviors–and how surveys or qualitative research are taboo because “we ought to be able to get the insight from our big data.” My advice to companies? Simplify things by using a sample of customers to understand the four kinds of content they respond best to.
Without some kind of simplifying framework, companies run into a load of problems. All the customer data in the world won’t tell you about the people who don’t use your product, for example. Also, the more data you have, the more false flags you find. The author Tyler Viglen is a genius at finding correlations between stupid things. The number of swimming pool deaths in the U.S. correlates neatly with the number of films Nicolas Cage appears in. Knowing this doesn’t make you smarter. As economics guru Nassim Taleb says, you’re just looking for the same needle in a bigger haystack.
Ideally, big data would mean that things could be personalized without touching the sides of an actual human creative. In practice that means that the ad at the bottom of this article will say “Housewives in Flyover, Wyoming (or wherever you live) are astonished by this teeth-whitening trick,” and Amazon is currently telling me I’ll love Dan Brown’s latest, even though it has 20 years worth of data to tell it that I wouldn’t read it if it was the only thing on the shelf in my solitary confinement cell during a 20-year stretch.
Organizations are hoarding data in the hope that they’ll be able to point artificial intelligence agents at it and uncover hitherto undreamed-of insights about their customers. They don’t seem to understand that you can point the same clever algorithm at a sample of a few thousand people and get the same result. The only time it’s useful to have a vast trove of data is when you’re looking for hyper-niche behaviors, but in a winner-take-all world, who wants to spend time with the long tail? The only niche worth bothering with is the super-rich, and there are easier ways to find out what they want. (A friend of mine is one of the heaviest users of a major global airline’s first class cabin. Once a year, he “coincidentally” finds himself seated across the aisle from the CMO of that carrier who always manages to strike up a conversation with him. Who needs artificial intelligence when you have low human cunning?)
If Big Data is the New Oil, as we keep hearing, then it doesn’t seem to be powering very impressive engines. Waitrose, my local UK supermarket, values my personal data so much that it bribes me with great free coffee for permission to mine my grocery lists. So far, I’ve received one voucher from them in four years, for something my wife is allergic to. Big companies are stockpiling big data and doing nothing with it; SAP estimates that up to 73% of all data gathered in organizations is not used. Like a pensioner who hoards newspapers for decades, this can be a harmless if expensive eccentricity until it collapses and crushes them.
The collapse has begun. Cambridge Analytica’s use of Facebook data has started a consumer backlash. The bureaucrats are actually way ahead of them; Europe is passing laws that will force companies to show exactly how they’ve gathered data and the permission to use it or face heavy fines. There are calls for the U.S. to do something similar. Companies have spent a decade trying to convince themselves that big data is an asset; now it’s become a liability. Instead of mining the behaviors of millions, we can get better insights from asking a few hundred, or a couple of thousand people about the things they’d like us to do. It makes better sense from a mathematical, an ethical, and a creative point of view.
Brian Millar is the co-founder of Paddle Consulting, a company that collects data ethically about the things that people love on the internet. @paddlepowered