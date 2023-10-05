The United States accounts for less than 5% of the world’s population. English is spoken by just 17% of the globe. Yet type common words like “house” into an AI image generator like DALL-E or Stable Diffusion, and you’ll be presented with imagery of classic Americana.

That’s a problem, as a new academic paper, presented this week at the IEEE/CVF International Conference on Computer Vision in Paris, France, shows. Danish Pruthi and colleagues at the Indian Institute of Science at Bangalore India analyzed the output of two of the world’s most popular image generators, asking people from 27 different countries to say how representative they thought the images produced were of their environment. The participants were shown AI-generated images in response to queries asking the tools to produce depictions of houses, flags, weddings, and cities, among others, then asked to rate them. Outside of the United States and India, most people felt that the AI tools outputted imagery that didn’t match their lived experience. Participants’ lack of connection seems understandable: Ask DALL-E or Stable Diffusion to show a flag, and it’ll generate the Stars and Stripes—of little relevance to people in Slovenia or South Africa, two countries where people were surveyed on their responses.

The results matched what the authors of the research expected to happen. Indeed, it’s why they did the research in the first place. “It was a pet peeve that a lot of these models assume a certain geographical context,” says Pruthi, one of the co-authors of the study. “We were interested in understanding what is the default demographic? Who is this technology for by default?” Despite choosing deliberately universal items or concepts—like weddings and houses—Pruthi was surprised by the near-total lack of representation models showed unless explicitly prompted to depict an item or concept from a specific country. When asked to produce an Indian house or a German house, the AI image generators improved their abilities, increasing the rating of their accuracy by an average of one point on a five-point scale rated by those seeing the image. But they were an improvement, rather than an accurate depiction: For many countries, the score was still only around 3.5 out of 5. “There is still a lot of room to improve and make these results a lot more personalized,” says Pruthi. In part, the problem is one that blights all AI: When it comes to the quality of model outputs, it largely depends on model inputs. And the input data isn’t always very good. ImageNet, one of the main databases of source images used for AI image generators, has long been criticized for racist and sexist labels on images. If they don’t contain source imagery that can depict large parts of the world and how they live, then it has a huge knock-on effect for representation.

