The great misunderstanding at the core of facial recognition

In the last five years, facial recognition has become a battleground for the future of artificial intelligence (AI). This controversial technology encapsulates public fears about inescapable surveillance, algorithmic bias, and dystopian AI. Cities across the United States have banned the use of facial recognition by government agencies and prominent companies have announced moratoria on the technology’s development.

But what does it mean to be recognized? Numerous authors have sketched out the social, political, and ethical implications of facial recognition technology. These important critiques highlight the consequences of false positive identifications, which have already resulted in the wrongful arrests of Black men, as well as facial recognition’s effects on privacy, civil liberties, and freedom of assembly. In this essay, however, I examine how the technology of facial recognition is intertwined with other types of social and political recognition, as well as highlight how technologists’ efforts to “diversify” and “de-bias” facial recognition may actually exacerbate the discriminatory effects that they seek to resolve. Within the field of computer vision, the problem of biased facial recognition has been interpreted as a call to build more inclusive datasets and models. I argue that instead, researchers should critically interrogate what can’t or shouldn’t be recognized by computer vision.

Recognition is one of the oldest problems in computer vision. For researchers in this field, recognition is a matter of detection and classification. Or, as the textbook Machine Vision states, “The object recognition problem can be defined as a labeling problem based on models of known objects.”

When recognition is applied to people, it becomes a question of using visual attributes to determine what kind of person is depicted in an image. This is the basis for facial recognition (FR), which attempts to link a person to a previously-captured image of their face, and facial analysis (FA), which claims to recognize attributes like race, gender, sexuality, or emotions based on an image of a face.

Recent advances in AI and machine learning (ML) research (e.g., convolutional neural networks and deep learning) have produced enormous gains in the technical performance of facial recognition and facial analysis models. These performance improvements have ushered in a new era of facial recognition and its widespread application in commercial and institutional domains. Nevertheless, algorithmic audits have revealed concerning performance disparities when facial recognition and analysis tasks are conducted on different demographic groups, with lower accuracy for darker-skinned women in particular.

In response to these audits, the Fairness, Accountability, and Transparency (FAT) in machine learning community has moved to build bigger and more diverse datasets for model training and evaluation, some of which include synthetic faces. These efforts include scraping images off the Internet without the knowledge of the people depicted in those photos, leading some to point out how these projects violate ethical norms about privacy and consent. Other attempts to create diverse datasets have been even more troubling, for instance, when Google contractors solicited facial scans from Black homeless people in Los Angeles and Atlanta who were compensated with $5 Starbucks gift cards. Such efforts remind us that inclusion does not always entail fairness. They also raise questions about whether researchers should even be collecting more data about people who are already heavily surveilled in order to build tools that can be used to further surveil them. This relates to what Keeanga-Yamahtta Taylor has termed predatory inclusion, which refers to when so-called inclusive programs create more harms than benefits for marginalized people, especially Black communities.

Other work in the Fairness, Accountability, and Transparency community has attempted to resolve the issue of biased facial recognition and unbalanced datasets by devising new data sampling strategies that either oversample minority demographics or undersample the majority. Yet another approach has been the creation of “bias-aware” systems that learn attributes like race and gender in order to improve model performance. These systems start by extracting demographic characteristics from an image, which are then used as explicit cues for the facial recognition task. Put simply: They first try to detect a person’s race and/or gender and then use that information to make facial recognition work better. However, none of these methods question the underlying premise that social categories like race, gender, and sexuality are fixed attributes that can be recognized based solely on visual cues—or that why automated recognition of these attributes is necessary in our society.

At the crux of this issue is the tenuous intersection between identity and appearance. For example, race is a social category that is linked, but not equivalent, to phenotype. Because race is not an objective or natural descriptor, it is impossible to definitively recognize someone’s race based on their image, and any attempts to do so can veer quickly into the realm of scientific racism. Similarly, while the performance of gender often includes some kind of deliberate aesthetic self-presentation, it cannot be discerned by appearance alone. Visual cues can suggest membership within a social group, but they do not define it.

In contrast, within the social sciences and in many activist spaces, recognition is understood as a social process that is borne out of shared histories and identities. As philosopher Georg Hegel describes it, recognition is mutual and intersubjective; we develop and affirm our sense of identity through being recognized by other people. Moreover, social recognition is ongoing because people are not fixed, nor are our relationships to each other.

Meanwhile, within the field of computer vision, recognition is always a one-sided visual assessment. Additionally, computer vision’s method of classification often imposes categories that are mutually exclusive—you can only belong to one—whereas from a social perspective, we regard identities as multiple and intersecting, with certain traits like gender or sexuality existing on some kind of spectrum. When facial analysis systems assign a label that contradicts a person’s self-identity—for instance, when classifying a person as the wrong gender—this can be an injurious form of misrecognition.

In comparison, social recognition is like a nod of assurance that says I see you as you see yourself. Or, as Stuart Hall puts it, shared identity is built off the “recognition of some common origin or shared characteristics with another person or group, or with an ideal, and with the natural closure of solidarity and allegiance established on this foundation.” Furthermore, shared identities are more than just descriptors of some preexisting condition; they can also be cultivated, mobilized, and leveraged as powerful tools for political organizing. When this happens, mutual recognition can form the basis for entire movements, where communities come together in solidarity to demand political recognition from the state and powerful institutions.

This kind of political solidarity was put into practice in the recent activist efforts to ban the use of facial recognition. In New Orleans, for example, the city’s facial recognition ban was achieved by a grassroots coalition of Black youth, sex workers, musicians, and Jewish Voices For Peace. Elsewhere, campaigns have featured diverse alliances of immigrant rights and Latinx advocacy organizations, Black and Muslim activists, as well as privacy and anti-surveillance groups. After a wave of successful bans at the municipal level, these community activists are now pushing for legislation at the state and national levels and fighting against the use of facial recognition by federal agencies and private companies. I myself was inspired to reflect on the different meanings of identity and recognition when Noor, an L.A.-based anti-surveillance activist, told me, “That’s how we defeat surveillance…instead of watching each other, seeing each other.” Noor’s words helped me to understand how seeing is about mutual understanding and validation, while watching is about objectification and alienation.

Ultimately, any computer-vision project is based on the premise that a person’s outsides can tell us something definitive about their insides. These are systems based solely on appearance, rather than identity, solidarity, or belonging. And while facial recognition may seem futuristic, the technology is fundamentally backward-looking, since its functioning depends on images of past selves and outmoded ways of classifying people. Looking forward, instead of asking how to make facial recognition better, perhaps the question should be: how do we want to be recognized?

Nina Dewi Toft Djanegara is a PhD candidate in the Department of Anthropology at Stanford University. Her research examines how technology—such as facial recognition, biometric scanners, satellites, and drones—is applied in border management and law enforcement. Twitter: @toftdjanegara

This essay is part of AI Now Institute’s ongoing “AI Lexicon” project, a call for contributions to generate alternate narratives, positionalities, and understandings to the better known and widely circulated ways of talking about AI.

Recognize your brand’s excellence by applying to this year’s Brands That Matter Awards before the early-rate deadline, May 3.

Explore Topics