During the Great Depression, the federal government created the Home Owners’ Loan Corporation, which made low-interest home loans, and the Federal Housing Administration, which guaranteed mortgages made by private banks. The people running HOLC didn’t know much, if anything, about local borrowers, so they constructed “residential safety maps” that graded neighborhoods on a scale of A to D, with D neighborhoods color-coded in red to denote undesirable, high-risk areas. These “redlined” maps were also used by FHA and private businesses, and spilled over into banking, insurance, and retail stores, creating a vicious cycle of restricted services and deteriorating neighborhoods.
Many private banks had their own redlined maps. In California, for example, Security First National Bank created a Los Angeles neighborhood rating system. Most neighborhoods in Central L. A. were redlined, often with explicit notations about “concentrations of Japanese and Negroes.” Boyle Heights was said to be “honeycombed with diverse and subversive elements.” Watts was redlined because it was a melting pot of not only Blacks and Japanese, but also Germans, Greeks, Italians, and Scots.
The 1968 Fair Housing Act outlawed redlining. However, in the age of Big Data, employment, insurance, and loan applications are increasingly being evaluated by data mining models that are not as overt but may be even more pernicious than color-coded maps, because they are not limited by geographic boundaries, and because their inner workings are often hidden.
No one, not even the programmers who write the code, know exactly how black-box algorithms make their assessments, but it is almost certain that these algorithms directly or indirectly consider gender, race, ethnicity, sexual orientation, and the like: call it hi-tech redlining. It is not moral or ethical to penalize individuals because they share group characteristics that a black-box algorithm has found to be correlated statistically with behavior.
Many algorithms for evaluating job candidates identify statistical patterns in the characteristics of current employees. The chief scientist for one company acknowledged that some of the factors chosen by its software do not make sense. For example, the software found that several good programmers in its database visited a particular Japanese manga site frequently; so it decided that people who visit this site are likely to be good programmers. The chief scientist said that, “Obviously, it’s not a causal relationship,” but argued that it was still useful because there was a strong statistical correlation. This is an excruciating example of the ill-founded belief–even by people who should know better–that statistical patterns are more important than common sense.
The CEO also said that the company’s algorithm looks at dozens of factors, and constantly changes the variables considered important as correlations come and go. She believes that the ever-changing list of variables demonstrates the model’s power and flexibility. A more compelling interpretation is that the algorithm captures transitory coincidental correlations that are of little value. If these were causal relationships, they would not come and go. They would persist and be useful. An algorithm that uses coincidental correlations to evaluate job applicants is almost surely biased. How fair is it if a Mexican-American female does not spend time at a Japanese manga site that is popular with white male software engineers?
Similarly, Amazon recently abandoned an attempt to develop customized algorithms for evaluating the resumes of applicants. The algorithms trained on the resumes of job applicants over the previous ten years, and favored people who were like the (mostly male) people Amazon had hired in the past. Candidates who went to all-women’s colleges were downgraded because men who worked at Amazon hadn’t gone to those colleges. Ditto with candidates who played on female sports teams.
A Chinese algorithm for evaluating loan applications looks at cell-phone usage; for example, how frequently incoming and outgoing calls are answered, and whether users keep their phones fully charged. Which of these metrics are signs of a phone user being a good credit risk; which of a bad credit risk? Any uncertainty you feel demonstrates the arbitrary nature of these markers.
These are coincidental correlations that are temporary and meaningless, but may well discriminate. When this credit rating system was first disclosed in China, answering all incoming calls was considered to be a sign of being a good credit risk. Who knows how it is interpreted now. But it may well be biased: How fair is it if certain religions are not supposed to answer the phone on certain days or at certain times of the day?
Data gathered on social media platforms offers companies a new font of dubious qualitative insights. Admiral Insurance, Britain’s largest car insurance company, planned to launch firstcarquote, which would base its car insurance rates on a computer analysis of an applicant’s Facebook posts; for example, word choices and whether a person likes Michael Jordan or Leonard Cohen. Then, like other black-box algorithms, they drifted off into patterns hidden inside the black box. There are surely biases in Facebook posts. How fair is it if a black male likes Michael Jordan and a white female likes Leonard Cohen? How fair is it if Facebook word choices that are related to gender, race, ethnicity, or sexual orientation happen to be coincidentally correlated with car insurance claims?
Algorithmic criminology is increasingly common in pre-trial bail determination, post-trial sentencing, and post-conviction parole decisions. One developer wrote that, “The approach is ‘black box,’ for which no apologies are made.” He gives an alarming example: “If I could use sun spots or shoe size or the size of the wristband on their wrist, I would. If I give the algorithm enough predictors to get it started, it finds things that you wouldn’t anticipate.” Things we don’t anticipate are things that don’t make sense, but happen to be coincidentally correlated.
Some predictors (wristband sizes?) may well be proxies for gender, race, sexual orientation, and other factors that should not be considered. People should not have onerous bail, be given unreasonable sentences, and be denied parole because of their gender, race, or sexual orientation–because they belong to certain groups.
The future of algorithmic suspicion can be seen in China, where the government is implementing a nationwide system of social credit scores intended to track what people buy, where they go, what they do, and anything else that might suggest that a person is untrustworthy–not just less likely to repay a loan, but also more likely to foment political unrest. The country’s security services are also investing heavily in face recognition technology, which could bring new data to credit-scoring type tools. Two Chinese researchers recently reported that they could predict with 89.5% accuracy whether a person is a criminal by applying their computer algorithm to scanned facial photos. Their program found “some discriminating structural features for predicting criminality, such as lip curvature, eye inner corner distance, and the so-called nose-mouth angle.” As one blogger wrote,
What if they just placed the people that look like criminals into an internment camp? What harm would that do? They would just have to stay there until they went through an extensive rehabilitation program. Even if some went that were innocent; how could this adversely affect them in the long run?
What can we do to police these systems? Demand–by law, if necessary–more transparency. Citizens should be able to check the accuracy of the data used by algorithms and should have access to enough information to test whether an algorithm has an illegal disparate impact.
Fortunately, there’s a growing recognition of the influence algorithms play in our lives. A survey published last week by the Pew Research Center found that many Americans are concerned about bias and unfairness when computers use math to make decisions, like assigning personal finance scores, making criminal risk assessments, or screening job applicants’ resumes and interviews. Curiously, the Pew survey also found that the public’s concern about AI scoring depends heavily on context: About 30% of people think it’s acceptable for companies to offer deals and discounts based on customers’ personal and behavioral data. But about 50% believe it’s okay for criminal justice systems to use algorithms to predict whether a parolee will commit another crime.
Is our faith in computers so blind that we are willing to trust algorithms to reject job applications and loan applications, set insurance rates, determine the length of prison sentences, and put people in internment camps? Favoring some individuals and mistreating others because they happen to have irrelevant characteristics selected by a mindless computer program isn’t progress: it’s a high-tech return to a previous era of unconscionable discrimination.
Gary Smith is Fletcher Jones Professor of Economics at Pomona College and the author of The AI Delusion, published by Oxford University Press.