Big Data gets bigger and bigger every day. But while a great deal more data is being generated and captured today than ever before, a lot of it is repetitive, erroneous, or just banal. Yes, it's data, but is it information? If I tweet that I'll be at the Starbucks on 10th Street in 30 minutes, and one of my friends re-tweets me, the re-tweet doubles the data, but does it really double the information? A high-definition TV signal transmits five times the data of a standard TV signal, but does it convey five times the information?
In information theory, "entropy" is the term used to describe how much actual information there is in any given set of data. It's a term borrowed from physics, where entropy is a quantifiable value analogous to the amount of randomness in a system. The Second Law of Thermodynamics famously states that entropy (that is, randomness) always increases. Mix a cup of hot water with a cup of cold water, and the trillions of individual hot and cold H2O molecules will jumble together more and more randomly until the result is lukewarm water. But mix two cups of lukewarm water together and these same trillions of molecules will never randomly jumble together in such a way as to make half the water hot and half of it cold. Entropy explains why there's no such thing as a perpetual-motion machine, and why time never runs backward, only forward.
Information entropy is mathematically similar, but rather than randomness it describes the unpredictability of information. Unpredictable information contains high entropy. The outcome of a coin toss, for instance, is information that has high entropy because it's impossible to predict in advance. On the other hand, when you see the letters F-a-s-t C-o-m-p-a-n-_ that last letter is not so hard to predict, right? This means the information conveyed by the last letter has low entropy. Similarly, the information in my original tweet about going to Starbucks is more entropic than the information in my friend's re-tweet, and a standard television broadcast is more entropic than a high-definition re-run of the same program.
Whether you ever consciously thought about it or not, you yourself almost certainly find information entropy highly useful. Think about the last time you entered a term in a search engine to try to find something either online or somewhere on your hard drive, for example. In order to find just the right information or reference, you probably entered the most unique (i.e., least predictable) description you could think of. You may not have known what the concept was called, but you were using information entropy to make your search more efficient.
Information entropy is how we learn new things and gain new perspectives. In a genuinely fascinating book, The Most Human Human, Brian Christian suggests that "we gain the most insight on a question when we take it to the friend, colleague, or mentor of whose reaction and response we're least certain." Moreover, he says, "to gain the most insight into a person, we should ask the question of whose answer we're least certain." In each of these cases, we would be seeking out high-entropy information in order to learn new things at a faster, more efficient rate. Don't tell us something we already know. Tell us something we don't know.
Entropic information may be an important clue to what it means to be conscious in the first place. Our brains are always trying to predict events and anticipate what's about to happen. So consciousness itself may be the result of our brain's effort simply to reduce the information entropy that confronts us. To predict our environment from one moment to the next. Information entropy is one reason weak ties in a social network are more useful in many contexts than close ties are, and it's also the secret sauce in creativity itself. To generate more new ideas, novel concepts, or innovations, expose yourself to high-entropy information--unusual perspectives, unanticipated stories, out-of-context ideas.
In our book Extreme Trust, Martha Rogers and I argue that one important business strategy for dealing with the growing proliferation of data and information is evidence-based management with managers using data to inform their judgments, rather than using their judgments to select the right data. And a couple of months ago I suggested in this space that putting evidence-based management into practice, whether you are managing a baseball team or bidding for offshore oil leases, is a lot harder than it looks. But extracting the right insights and knowledge from data requires more than simply relying on numbers. To get the most valuable information you also need to seek out anomalies. To find unusual things. To dig carefully for entropy.
The problem is that while high-entropy information may generate the most new knowledge and insight, we all have a natural aversion to unpredictability. As human beings we are biased to prefer the routine and familiar, as opposed to the new and different. We feel safer when we know what's going to happen next and we are simply more comfortable talking to people whose views we share. Seeking out high-entropy information may be a mind-expanding activity, in other words, but it still feels unnatural, and you can improve with practice. To get more comfortable with it, try a few exercises like these:
- Play a game with someone else to find two properly spelled English words that, when searched together on Google or Bing, return the fewest results.
- Pick a point of view you vehemently disagree with, and argue in favor of it instead. Be convincing.
- Read a magazine you would never ordinarily have the least interest in.
- At a restaurant, order a food you normally can't stand, and eat it.
- Put your clothes on in a different order every day (i.e., shirt first one day, socks first the next, right then left instead of left then right, and so forth).
- At a party, find the person you have least in common with and spend at least an hour in conversation with them.
[Image: Flickr user Jane Rahman]