Google Street View has never felt so enchanting. In New York’s Botanical Garden, children shout and play in the distance. On a highway in Shibuya, Tokyo, cars whoosh by with so much detail, you can hear the first drops of rain on the street. At Gaudi’s famous Barcelona cathedral, La Sagranda Familia, a priest chants in Latin as his deep voice echoes around the cavernous space.
Yet none of these sounds are real. Or, at least, they weren’t recorded where you see them. Instead, an AI has added a soundtrack to all of Street View, and often, the sounds are so convincing that you would never know they were faked.
The project is Imaginary Soundscape, first spotted by Prosthetic Knowledge. Created by Nao Tokui, the system used a pair of neural networks first developed by MIT. One network learned how to identify the contents of images and video frames. The other network learned how to identify audio, distinguishing sounds like environmental acoustics. Combining the logic of the two, the system can identify open-source audio tracks that seem suitable for any given visual scene.
Tokui’s team expanded a bit upon MIT’s research and turned it into a functional website that anyone can try. The resulting Imaginary Soundscape is a fully functional iteration of Google Street View, but with a soundtrack. You can type in any address in the world and see it appear with one of 15,000 sounds pulled from an open-source library. Or you can just hit the “random” button and go.
Notably, the system can be hilariously wrong. Take the footage of an empty Tokyo Stadium, the grounds of which are covered in thick green grass for what looks like a soccer match. While you’d expect to hear silence, or even a crowd watching a typical sporting event, the soundtrack plays high-performance cars zooming by, as if you’re sitting in an F1 race. The error crushes immersion, but it’s also satisfying in a way–like discovering an Easter Egg left by a silly computer.
Truth be told, Imaginary Soundscape isn’t doing anything that film editors haven’t done for almost a century. In any Hollywood film or random TV show, most of the audio is faked, added in after shooting. An actor’s lines are re-recorded in a silent sound booth so they sound clear. Environmental acoustics might be recorded from another area entirely, or they are run through audio filters, removing unwanted distractions. Sound events the editor wants to highlight, like a car crash or bird chirp, are simply added in from a massive sound library. Of course, the musical score itself wasn’t recorded by some symphony orchestra set up just off camera.
So Imaginary Soundscape is really just doing what people have done for years–but it’s doing so in an automated fashion, free of human labor, that allows its techniques to scale to an entire globe of streets, boulevards, and highways with ease. Indeed, it’s a good reminder of both the promise and limitations of AI. While neural nets can absolutely discern and react to imperceptible trends that we cannot otherwise see, much of the time, these advanced systems aren’t operating on a logic that’s beyond the core capabilities of any human. They just think faster. And cheaper. And without complaining.