Fast company logo
|
advertisement

Researchers were able to attack a common speech recognition system using voice commands hidden in other audio recordings.

[Source Images: Karine Patry/iStock, seewhatmitchsee/iStock]

BY Jesus Diaz3 minute read

Scientists at the Ruhr-Universitaet in Bochum, Germany, have discovered a way to hide inaudible commands in audio files–commands that, while imperceptible to our ears, can take control over voice assistants. According to the researchers behind the technology, the flaw is in the very way AI is designed.

It’s part of a growing area of research known as “adversarial attacks,” which are designed to confuse deep neural networks–usually visually, as Co.Design has covered in the past–leaving them potentially vulnerable to attacks by bad-faith actors on the technology and infrastructure in our world that depends on AI to function.

In this case, the system being “attacked” by researchers at the Ruhr-Universität Bochum are personal assistants, like Alexa, Siri, or Cortana. According to Professor Thorsten Holz from the Horst Görtz Institute for IT Security, their method, called “psychoacoustic hiding,” shows how hackers could manipulate any type of audio wave–from songs and speech to even bird chirping–to include words that only the machine can hear, allowing them to give commands without nearby people noticing. The attack will sound just like a bird’s call to our ears, but a voice assistant would “hear” something very different.

Attacks could be played over an app, for instance, or on a TV commercial or radio program, to hack thousands of people–and potentially make purchases with or steal their private information. “[In] a worst-case scenario, an attacker may be able to take over the entire smart home system, including security cameras or alarm systems,” they write. In an example below, they show how our ears hear one string of text, while the speech recognition system hears “deactivate security camera”:

The hack takes advantage of a trick called the “masking effect.” As researcher Dorothea Kolossa explains in a presentation of their investigation, it’s based on the psychoacoustic model of hearing: When your brain is busy processing a loud sound of a certain frequency, you’re “no longer able to perceive other, quieter sounds at this frequency for a few milliseconds.” That’s where the scientists found they could hide commands to hijack any system, like the automatic speech recognition system Kaldi, which they say is at the heart of Amazon’s assistant.

It’s the same scientific principle that allows MP3s to be compressed: The algorithm judges which sounds you’re going to really hear, eliminating the rest to make the file smaller. Here, however, instead of deleting sounds, hackers can just add other sounds. Unlike human brains, AI like Alexa’s can actually hear and process everything. The way it is trained, using adversarial networks, leaves it wide open for attack because it has been designed to understand any audio command and follow it, no matter if humans hear it or not. You can hear other examples here.

The researchers’ only caveat is that they haven’t tried playing their doctored songs or chirping birds yet–they’ve only fed the device actual audio files. However, they’re totally confident that playing the attacks out loud will have the same effect. “In general, it is possible to hide any transcription in any audio file with a success rate of nearly 100%,” the researchers conclude.

The results are worrying, even if such an attack by a malicious actor hasn’t happened yet. It’s not the first time the security of voice systems has been questioned, either. In June of last year, scientists found they could “whisper” commands to Alexa that were outside of the audible frequency of the human ear. According to the scientists, such attacks are possible thanks to the intrinsic way deep neural networks are trained, since the trickery is designed based on what the system “knows” as well as its blind spots. The same weakness can fool AI-powered computer vision systems into thinking that, for instance, a picture of a stop sign is actually a yield sign.

Amazon and other voice assistant platforms could argue that users can protect themselves right now against this type of attack. You can secure critical Alexa skills like voice-activated shopping, access to banks or financial institutions, and opening your house’s door by requiring the use of a PIN. However, this PIN setting is off by default. Likewise, Alexa’s blue ring could alert you to the fact that something’s up. But who’s looking at their Echo at every second?

An Amazon spokesperson told Co.Design that they take security issues seriously, and that the company is “reviewing the findings by the researchers.” Another way to look at this problem? Whenever possible–and unfortunately, it’s not always possible–don’t use unsecured smart speakers for sensitive information until they deliver on the promise of a secure and safe user experience.

Recognize your brand’s excellence by applying to this year’s Brands That Matter Awards before the early-rate deadline, May 3.


ABOUT THE AUTHOR

Jesus Diaz is a screenwriter and producer whose latest work includes the mini-documentary series Control Z: The Future to Undo, the futurist daily Novaceno, and the book The Secrets of Lego House. More


Explore Topics