Google Translate is the world’s most popular web translation platform, but one Stanford University researcher says it doesn’t really understand sex and gender. Londa Schiebinger, who runs Stanford’s Gendered Innovations project, says Google’s choice of source databases causes a statistical bias toward male nouns and verbs in translation. In a paper on gender and natural language processing, Schiebinger offers convincing evidence that the source texts used with Google’s translation algorithms lead to unintentional sexism.
In a peer-reviewed case study published in 2013, Schiebinger illustrated that Google Translate has a tendency to turn gender-neutral English words (such as the, or occupational names such as professor and doctor) into the male form in other languages once the word is translated. However, certain gender-neutral English words are translated into the female form . . . but only when they comply with certain gender stereotypes. For instance, the gender-neutral English terms a defendant and a nurse translate into the German as ein Angeklagter and eine Krankenschwester. Defendant translates as male, but nurse auto-translates as female.
Where Google Translate really trips up, Schiebinger claims, is in the lack of context for gender-neutral words in other languages when translated into English. Schiebinger ran an article about her work in the Spanish-language newspaper El Pais into English through Google Translate and rival platform Systran. Both Google Translate and Systran translated the gender-neutral Spanish words “suyo” and “dice” as “his” and “he said,” despite the fact that Schiebinger is female.
These sorts of words bring up specific issues in Bing Translate, Google Translate, Systran, and other popular machine translation platforms. Google engineers working on Translate told Co.Labs that translation of all words, including gendered ones, is primarily weighed by statistical patterns in translated document pairs found online. Because “dice” can translate as either “he said” or “she said,” Translate’s algorithms look at combinations of “dice” in conjunction with neighboring words to see what the most frequent translations of those combinations are. If “dice” renders more often in the translations Google obtains as “he says,” then Translate will usually render it male rather than female. In addition, Google Translate’s team added that their platform only uses individual sentences for context. Gendered nouns or verbs in neighboring sentences aren’t weighed in terms of establishing context.
Schiebinger told Co.Labs that the project evolved out of a paper written by a student who was working on natural language-processing issues. In July 2012, a workshop was held at Stanford University with outside researchers that was turned, post-peer review, into the machine translation paper.
Google Translate, which faces the near-impossible goal of accurately translating the world’s languages in real time, has faced gender issues for years. To Google’s credit, Mountain View regularly tweaks Google Translate’s algorithms to fix translation inaccuracies. Language translation algorithms are infamously tricky. Engineers at Google, Bing, Systran, and other firms don’t only have to take grammar into account–they have to take into account context, subtext, implied meanings, cultural quirks, and a million other subjective factors . . . and then turn them into code.
But, nonetheless, those inaccuracies exist–especially for gender. In one instance last year, users discovered that translating “Men are men, and men should clean the kitchen” into German became “Männer sind Männer, und Frauen sollten die Küche sauber”–which means “Men are men and women should clean the kitchen.” Another German-language Google Translate user found job bias in multiple languages–the gender-netural English language terms French teacher, nursery teacher, and cooking teacher all showed up in Google Translate’s French and German editions in the feminine form, while engineer, doctor, journalist, and president were translated into the male form.
Nataly Kelly, author of Found In Translation: How Languages Shape Our Lives And Transform The World, whose firm offers language-technology products, told Co.Labs that a male bias in machine translating is extremely common. “If you’re using a statistical approach to produce the translation, the system will mine all past translations and will serve up the most likely candidate for a “correct” translation based on frequency. Given that male pronouns have been over-represented throughout history in most languages and cultures, machine translation tends to reflect this historical gender bias,” Kelly said.
“The results can be highly confusing, even inaccurate. For example, in Google Translate, if you translate engineer into Spanish, it comes out as the masculine ingeniero, but if you put in female engineer, you get ingeniero de sexo feminino, which means something like a male engineer of the feminine sex. This sounds pretty strange in Spanish, to say the least! If you type female engineer into Bing Translate, you get ingeniera, which is technically correct. But still, you have to specify female in order to produce a feminine result. You don’t have to specify male engineer to get ingeniero. You only need to type in engineer. [There is] an inherent gender bias in most machine translation systems.”
The reason why this happens is statistical. In every language that Google Translate operates in, algorithms process meaning, grammar, and context through a massive number of previously uploaded documents. These documents, which vary from language to language, determine how Google Translate actually works. If source material used for translations has an aggregated bias in terms of one gender being preferred over another, that will be reflected in translations received by users.
When a user on Google Groups questioned male gender bias in Hebrew translations in 2010, Google’s Xi Cheng noted that “Google Translate is fully automated by machine; no one is explicitly imposing any rules; the translation is generated according to the statistical nature of the corpus we have.”
According to Schiebinger, machine translation systems such as Google Translate use two separate kinds of corpuses. A “parallel corpus” with text in one language that is used to compare a translation in another language, while a large monolingual corpus in the target language being translated to is used to determine grammar and word placement. If masculine or feminine forms of words are systematically favored in the corpus used, it leads the algorithm to translate in favor of that particular gender.
Machine translation ultimately depends on translators and linguists giving context to both algorithms and the source material they use. Google Translate, Bing Translate, and Systran all do a stunning job of providing instant translations in a staggering array of languages. The challenge for translation platform developers is how to further refine their product and increase accuracy–something we’ll see more of in the future.
[Teacher Image: Everett Collection via Shutterstock]