Whether it’s Microsoft Paint, Adobe Photoshop, Snapchat’s Bitmoji, or even the (imaginary) Star Trek Holodeck, designers and engineers have been working for decades to figure out how to turn our ethereal imaginations into tangible images—with as little technical expertise as possible.
Strolling Cities—a new video project out of the MIT-IBM Watson AI Lab—takes this tradition to new heights. It’s a video that feeds poetry into a machine, and allows artificial intelligence to dream what the poetry looks like.
And in an auteurial twist, this AI wasn’t trained on all the images of the world to be able to dream anything at all. Instead, think of the AI more like a child that was raised on the streets of Italy. So its entire point of reference are canals and porticoes, cobblestones and sea. It’s an AI model that’s limited by design, built to have a specifically Italian point of view, to capture the nostalgic sensation of visiting a specific place.
The project was born from COVID-19 lockdown, as Mauro Martino, the head of IBM’s Visual AI Lab, missed home. He was in Cambridge, Massachusetts, while the pandemic exploded in his home country of Italy.
“I decided that the beauty and sentiment, the social, historical, and psychological contents of my memories of Italy could become an artistic project, probably a form of emotional consolation,” says Martino. “Something beautiful always comes out of nostalgia.”
To build the video you see above, Martino’s team enlisted students from Politecnico di Milano. During lockdown, they walked the streets of nine different Italian cities, capturing 2 million photos of these cityscapes, not from above or by car, but intimately on foot. The images were then labeled (with words like “sky” or “window”) through automation, while an AI was trained to imagine cities from nothing but these images.
As Martino points out, we’d already seen all sorts of technically proficient image-generating AI systems, from imaginary Google Street View to tulips. But building these systems requires piles of source image data, so most AIs learn what things look like from publicly posted images on the internet. That means you get an AI that can generate something that appears realistic, but aesthetically, it’s not compelling. It’s a technically accurate, dull average.
“There is no awareness of the complexity of the cinematic language, there is no authorship in the composition,” says Martino.
Instead, Strolling Cities wears blinders. It develops unmistakable, but also sometimes unplaceable, Italian landscapes—a psychedelic fever-dream mix of Bologna, Venice, Rome, Como, and more—all captured through the same planned methodology and camera system. The source footage is curated, allowing the system to generate fully fake images that still feel like they have a point of view.
“There is authenticity, in Strolling Cities you can see Rome as a Roman lives it,” says Martino. “Something magical happens, the landmarks disappear, but the cities are still recognizable.”
It’s easy to follow along with the way the AI thinks. A mention of the ocean makes the ocean appear, and sidewalks makes walkways appear. The narrator saying “aerial verticality” makes the buildings stretch into the sky. And at times that there’s not a clear enough Italian reference point, such as a mention of “rice fields,” the system seems to do its best, offering a field of something that looks not quite like grass, but not quite like rice or any other plant either.
As for the future of the project, Martino is planning to debut real-time installations, which allow you to speak and have the AI imagine in real time—while pushing the boundaries of the system’s imagination. “Now we can generate full red cities with blue streets, or be more abstract and generate a romantic location, or depressing place,” says Martino, teasing that soon, we will be able to speak our minds to computers, and allow them to dream anything we might imagine.
“It’s a beautiful time for ‘dreaming’ together!” he says.