This blog entry is not boring! I feel the need to announce that right up front, because in my previous blog entry, I promised to write about statistical sampling, and that’s what I’m doing here. Despite what you might think about statistics, the two anecdotes I’m going to recount here are actually quite interesting, even to people who never took a course in statistics. Both illustrate the same principle: You can’t understand what a study tells you unless you understand the sample it’s based on.
The first anecdote is based on the research that social scientists did when they invented the science of jury selection. This happened in 1972, when seven radicals were about to go on trial in Harrisburg, Pennsylvania, for conspiracy to raid draft boards and destroy records, among other planned antiwar actions. This was a time of great political polarization and in a place that was characterized by political conservatism. The researchers, working on behalf of the antiwar activists’ lawyers, wanted to find a way to predict the political leanings of jurors so the lawyers could seat a jury that would be less conservative than one would choose at random from the Harrisburg population.
The social scientists surveyed citizens of that community to identify their political attitudes and then correlated these attitudes with other facts about the jurors. They discovered that the easiest way to predict a Harrisburger’s politics was to ask how much education the person had: The more educated the person was, the more conservative that person’s politics.
The researchers eventually realized why this was so: Young people in Harrisburg who became highly educated acquired the occupational mobility to leave the region if they were not conservative; therefore, the sample of highly educated people who remained had to be quite conservative. If the results of their survey surprised you, it’s because you didn’t stop to think about what the sample really was: not everyone who ever lived in Harrisburg, but rather those who remained—by choice or because they were less able to move out.
The second anecdote is from the Second World War. British bomber planes flying missions over Germany were often shot down by anti-aircraft fire. The Royal Air Force wanted to shield vulnerable parts of the aircraft with armor, but they wanted to use a minimal amount of armor to avoid weighing down (and slowing down) the planes. The RAF commissioned the statistician Abraham Wald to examine the planes after bombing missions to determine where on the planes’ undersides it was most critical to apply armor.
Wald counted bullet holes in the planes and recommended that armor be applied where there were the fewest bullet holes.
This may seem like a mistake to you. Maybe you’re thinking that armor is supposed to protect against anti-aircraft fire, so shouldn’t they have armored the places that got hit the most?
Again, consider the sample: Wald was not looking at every bomber that flew a mission, but rather those that returned from missions. Bombers that got shot down were removed from the sample. The bombers that returned and made up the sample were the ones that were hit only in places that were not critical for flying. The places where the surviving planes were not hit, therefore, were the most likely to be critical and in need of armor.
If you’re wondering why I’m writing about this subject in a blog about careers, consider this blog entry a look at how complicated statisticians’ work can be, not so much in terms of the mathematics, but rather in terms of the concepts that must be understood.
The nonstatistical lesson to take away from these anecdotes is that you have to be careful when you make a generalization about a population—for example, that educated people are more liberal. Such generalizations may be true in some global sense, but the actual population you need to deal with may really be a subset of the global population, either self-selecting or selected by some exterior factor you have not considered.