Google is working on language-to-text AR glasses. It's a complicated idea

Google appears to be working on AR glasses once again, but this time, it is showing a new feature that translates speech to readable text.

At last week’s Google I/O 2022, the company demonstrated an AR glasses prototype that can translate spoken language into readable display text. Google has not hinted whether they are developing these as a product, or when, but the fact that they showed them to developers is indicative that they are thinking of how to extend the model of AR glasses to utilize their gigantic datasets and existing technologies.

If Google moves forward with the product, it is likely that it will frame it as a device that would attempt to break down language barriers. Sounds great, right? No more trying to find Google Translate on the web and pecking phrases into our mobile phones to translate things. When (or if) these hit the market, we’ll finally be able to read foreign signs, order correctly in restaurants, and even make new friends more easily when we travel. More significantly, there would be a way to quickly translate communication in the event of an emergency, when people may not all speak the same language. On another level, these “translation glasses” could also open up communication channels for the deaf and hard of hearing community, giving them a new way to communicate with those around them.

However, as with all new technology ideas, Google’s translation glasses could come with a huge social cost: to our privacy, our well-being, and our cooperation with each other in our communities. What does it mean when Google becomes the translator for our lives, and are we comfortable with that notion?

The problem with any type of technology translation device is that it has to “listen” to those around it, to procure the data to translate. And if the AR glasses are listening, we will need to know what, or whom, they are listening to—and when they are listening. At the moment, we don’t know if these glasses will be able to distinguish between more than one person at a time, either. Also, we will need to know if it is legal for these glasses to listen without consent—and if one needs the consent of someone to record them in order to translate them, will one need the glasses to translate the consent? We don’t know if in the future, these glasses will have the capacity to record what they translate, nor will we know if they could identify whom they are recording at any given time, or within what range they are capable of listening. And If they are recording glasses, or even with the transcribed text, we’ll need to know if that is stored somewhere that can be erased, and if people could opt-out in a public space without being recorded while doing so.

Let’s assume for the moment that these Google glasses won’t record us, and that Google manages to figure out consent and permission. Given that, in our crowded, noisy world, the usual problems with speech to text could still abound in the form of misunderstandings, misrepresentations, etc., in what Google ‘hears’ and what it writes as a result of that hearing. The tech might also have a lot of misspellings and confusion with mixing languages. As The Verge pointed out, many of us “code switch” using words from many different languages interspersed, with the added complexity of not all of them reading from left-to-right, which will need to be accommodated, too.

Now add to that an aggregate population using these while wandering around, which invokes much of what I wrote with Dr. Catherine Flick about Meta’s pre-Ray-Ban Stories Project Aria glasses. Many of the same issues persist, except for that with these new Google glasses, people may be walking around and reading transcripts, which again, is more like what was going on in the early days of cell phones and Divided Attention, creating potentially dangerous outcomes as distracted people walk into traffic or fall into fountains.

One of the main concerns with the glasses is Google’s seeming assumption here that technology can solve cultural problems—and that if the technology isn’t working, the solution is to develop and apply more technology. In this case, solving cross-cultural communication problems cannot be fully solved with language translation. Tech can help, but these glasses won’t translate culture or cultural norms such as whether someone is comfortable being direct or indirect, or any one of multitudes of cultural nuances and cues found in the ways that different people in different groups communicate with each other. For that, we need other humans to guide us.

S.A. Applin, PhD, is an anthropologist whose research explores the domains of human agency, algorithms, AI, and automation in the context of social systems and sociability. You can find more at @anthropunk and PoSR.org.

Recognize your brand’s excellence by applying to this year’s Brands That Matter Awards before the early-rate deadline, May 3.

Google is working on language-to-text AR glasses. It’s a complicated idea

Explore Topics