Why genuine human intelligence is key for the development of AI

Amid stunning developments in artificial intelligence, it’s important for data scientists to rely on their own critical thinking and not just turn everything over to the machines.

Why genuine human intelligence is key for the development of AI
[Photo: Ran Kyu Park/iStock]

The developments have been fast and furious in recent months. Microsoft announced that it will invest $1 billion in a partnership with research lab OpenAI to create artificial general intelligence (AGI), the holy grail of artificial intelligence. OpenAI’s CEO Sam Altman has boasted that “the creation of AGI will be the most important technological development in human history”

Computers can do many very specific tasks much better than humans, but they do not have anything remotely resembling the wisdom, common sense, and critical thinking that humans use to deal with ill-defined situations, vague rules, and ambiguous, even contradictory, goals. The development of computers that can do everything the human brain does would be astonishing, but Microsoft’s record is not encouraging.
In 2016, Microsoft released Tay (“Thinking About You”), a chatbot that Microsoft promoted as “designed to engage and entertain people where they connect with each other online through casual and playful conversation.” Tay was programmed to pose as a millennial female by learning to mimic the language used by millennials. Microsoft boasted that, “The more you chat with Tay, the smarter she gets.” In less than a day, Tay sent 96,000 tweets and had more than 50,000 followers. The problem was that Tay became a despicable chatbot, tweeting things like, “Hitler was right i hate the jews,” “9/11 was an inside job,” and “i fucking hate feminists.” Tay was adept at recycling the words and phrases it received, but it had no way of putting words in context or understanding the tweets it was sending. Microsoft took Tay offline after 16 hours but, a week later, Tay was back online and soon put itself in an endless loop, tweeting, “You are too fast, please take a rest,” over and over, incessantly disrupting the lives of more than 200,000 Twitter followers. Microsoft claimed that the rerelease was an accident and took Tay offline again.
AGI may be an illusive dream, but data science offers the realistic opportunity to use big data and powerful computers to make informed decisions based on facts rather than lazy thinking, whims, hunches, and prejudices. Unfortunately, the reality is that businesses and governments are still making many of the same mistakes that were made before the data deluge began, but now they are making them much faster. Turning important decisions over to machines just automates the mistakes.
Data science is more than mathematical proofs, statistical calculations, and computer programming. Genuine human intelligence is essential: experimental design, wisdom, common sense, skepticism, and critical thinking. Data scientists shouldn’t strive to be machines, in all their mindless pattern-seeking, curve-fitting glory; they should strive to be scientists.
There are nine common pitfalls that are to be avoided if data science can fulfill its enormous potential:
Using bad data. Charles Babbage, the inventor of the first mechanical computer, was twice asked by members of Parliament, “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?” Good data are required, not optional.
A study of patients with sepsis at a Chicago hospital concluded that patients with low pH levels in their blood were less likely to return to the hospital soon after being discharged. The correlation was a decisive 0.96. However, the data included patients who died during their hospital stay! The patients least likely to be readmitted were the ones who had been discharged to the mortuary. When the deceased were excluded, it turned out that patients with low pH values were, in fact, in serious danger. 

Putting data before theory. Some data scientists ransack data for patterns without being guided by theory or common sense. Indeed, they believe that thinking about a question limits the possibilities for knowledge discovery. Unfortunately, the data deluge has exploded the number of patterns that can be discovered, the vast majority of which are necessarily meaningless. The paradox of big data is that the more data we pillage for patterns, the more likely it is that what we find is worthless or worse.
An internet marketer tested three alternative colors for its landing page (yellow, red, and teal) against its traditional blue color in 100 or so countries, which virtually guaranteed that they would find a revenue increase for some color for some country. They concluded that England loves teal, except that it didn’t.
Worshiping math. Mathematicians love math and many non-mathematicians are intimidated by math. This is a lethal combination that can lead to the creation of wildly unrealistic models.
Many mathematical models of mortgage defaults crashed during the Great Recession because they made the convenient assumptions that the chances of default were normally distributed and independent. They underestimated the chances of extreme events and did not consider the possibility that a macroeconomic event like an economic recession would cause an avalanche of mortgage defaults.

Worshiping computers. It is tempting to think that because computers can do some things extremely well, they must be highly intelligent, but being useful for specific tasks is very different from having a general intelligence that applies the lessons learned and the skills required for one task to more complex tasks, or completely different tasks. Our awe of computers is not a harmless obsession. If we think computers are smarter than us, we may be tempted to let them do our thinking for us—with potentially disastrous consequences.
Algorithmic criminology using black-box models are becoming commonplace in pretrial bail determination, posttrial sentencing, and post-conviction parole decisions. One developer wrote that, “The approach is ‘black box,’ for which no apologies are made.” He gives an alarming example: “If I could use sunspots or shoe size or the size of the wristband on their wrist, I would.” The black-box algorithms tend to be racially biased and do not outperform simple models that consider only age and prior convictions.

Torturing data. In a tireless search for statistically significant relationships, some are tempted to slice and dice the data in innumerable ways. In the immortal words of Ronald Coase, “If you torture data long enough, it will confess.” Big data and powerful computers facilitate the abuse.
A prominent researcher advised an assistant who was analyzing data that had been collected at an all-you-can-eat Italian buffet to separate the diners into “males, females, lunch goers, dinner goers, people sitting alone, people eating with groups of 2, people eating in groups of 2+, people who order alcohol, people who order soft drinks, people who sit close to buffet, people who sit far away, and so on,” and then look at different ways in which these subgroups might differ: “# pieces of pizza, # trips, fill level of plate, did they get dessert, did they order a drink, and so on.” He told the assistant to, “Work hard, squeeze some blood out of this rock.” This rock squeezing resulted in four published “pizza papers,” with the most famous one reporting that men eat 93 percent more pizza when they dine with women. More than a dozen of his published papers have now been retracted, and he has resigned from his university position.

Fooling yourself. Physicist Richard Feynman offered this timeless advice for scientists: “The first principal is that you must not fool yourself—and you are the easiest person to fool.” True scientists share their theories, question their assumptions, and seek opportunities to run experiments that will verify or contradict them. Data clowns only see what they want to see.
A study asked high school students to predict their scores on a math test. The average predicted score was higher than the average actual score, but there was a 0.70 correlation between the predicted and actual scores. The author drew two conclusions. The first was that students overestimate their ability. However, it may be that these students underestimated the difficulty of the test they would be given. The author’s second conclusion was that test scores can be increased by raising students’ self-esteem. However, the positive correlation between the predicted and actual scores may instead reflect the fact that most students who did well knew they were good at math and the students who failed knew that they did not know the material very well. They weren’t being unduly pessimistic; they were being realistic.

Confusing correlation with causation. No matter how many times we are told that correlation is not causation, it can be irresistibly tempting to ignore this essential advice.
In 2011, Google created an artificial intelligence program called Google Flu that used search queries to predict flu outbreaks. They boasted that, “We can accurately estimate the current level of weekly influenza activity in each region of the United States, with a reporting lag of about one day.” They said that their model was 97.5 percent accurate, in that the correlation between the model’s predictions and the actual number of flu cases was 0.975. How did Google do it? Google’s data mining program looked at 50 million search queries and identified the 45 queries that were the most closely correlated with the incidence of flu. Since flu outbreaks are highly seasonal, Google Flu may have been mostly a winter detector that chose seasonal search terms, like Christmaswinter vacation, and Valentine’s Day. When it went beyond fitting historical data and began making real predictions, Google Flu was far less accurate. After issuing its report, Google Flu overestimated the number of flu cases for 100 of the next 108 weeks, by an average of nearly 100 percent. Google Flu no longer makes flu predictions. 
Being surprised by regression toward the mean. When data fluctuate, relatively large values overestimate the phenomenon being measured; so subsequent values tend to be closer to average. For example, a golfer who wins the Masters golf tournament probably will not do as well in the next tournament, not because he has been jinxed or his abilities have eroded, but because the win is an overestimate of his ability.
Data can regress upwards as well, kind of like an anti-jinx. For example, a data science company conducted experiments comparing the current layout of a client’s web page to as many as 20 alternative layouts across a million different domain names. Clients occasionally complained about “underperforming” domain names that they felt should be earning more ad revenue. A data analyst was given a list of domain names with revenue down over the past three months and asked to tinker with the layouts to see if he could boost revenue. He was successful, spectacularly so. After he made changes, revenue inevitably went up about 20 percent the next day. He gained a reputation as a rock star, but one day he was too busy to make any changes. Revenue jumped like it always did, and now the game was up. It wasn’t the analyst. It was the fact that these were truly underperforming sites and their revenue regressed upward toward the mean.

Doing harm. An unfortunate reality in the age of big data is that businesses and governments monitor us incessantly so that they can predict our actions and manipulate our behavior. Good data scientists proceed cautiously, respectful of our rights and our privacy. The Golden Rule applies to data science: treat others as you would like to be treated.
An internet dating site ran three experiments. In experiment 1, they temporarily removed all pictures from the site and found that there were far fewer initial messages, supporting the hypothesis that love is not blind. In experiment 2, they randomly hid people’s profile text and found that it had no effect on personality ratings, supporting the hypothesis that love cannot read. Experiment 3 reversed the compatibility ratings, so that randomly selected customers were informed that someone who was highly compatible with them was a bad match and vice versa. The first two experiments were relatively harmless; the third not so much. The company should have considered the fact that their customers surely did not want their lives disrupted by romantic mismatches. A date with an incompatible person could be excruciating; a missed date with a potential soulmate could be life-changing.
It takes critical thinking to avoid these pitfalls. To put the science in data science, we need to act less like machines and more like scientists.

Gary Smith is the Fletcher Jones Professor of Economics at Pomona College. His Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics (Overlook/Duckworth, 2015) was a London Times Book of the Week and debunks a variety of dubious and misleading statistical practices. His latest book is The 9 Pitfalls of Data Science (Oxford University Press, 2019.