Imagine a tourist and a guide exchanging texts as the visitor tries to find the Manhattan coffee shop where she’s meeting a friend. Text by text, she mentions businesses or landmarks she spots, and after each, the guide instructs her to turn right, left, or go straight. Finally, she arrives.

In the age of Google Maps on our smartphones, this seems unnecessary, but if you’re trying to build an artificial intelligence system that uses natural language communication to help us solve everyday problems, this is precisely what you would build to help advance the state of the art.

That’s why Facebook’s artificial intelligence scientists built Talk the Walk, a research project that aims to teach AI systems to communicate using natural language in much the same way a baby does—by naming what it sees.

Talk the Walk tasks two AI agents—a tourist, and a guide–with navigating to a location in Manhattan by conversing, with the tourist bot explaining what it “sees” and the guide bot responding with navigational instructions. But the system is also able to parse what the tourist is saying, even if it mixes in colloquial language a human might use. That’s because, Facebook writes in a blog post on the project, “A series of carefully scripted responses isn’t likely to capture the nuanced inaccuracies and muddled messaging inherent to genuine, person-to-person conversations.”

The idea is that this could be a more efficient way to teach AI systems like this to communicate effectively, rather than by training them on pure-text data sets, Facebook believes. And in its experiments, the company’s AI research team found that its bot guide was more accurate than humans performing the same navigation task.

Facebook, like Google, Microsoft, Apple, and other big tech companies, has a strong interest in developing AI systems that communicate well so that users can use their voices or type in casual language to the bots. But even as it’s working on systems that can understand natural language, Facebook has also built bots that have created their own language.

On the town, with a “novel attention mechanism”

The Talk the Walk project was one of the first to work with 360-degree visual information, says Douwe Kiela, a scientist on the Facebook AI Research team in New York. The tourist bot utilizes 360-degree imagery taken by the researchers in five Manhattan neighborhoods, which represents the real world of the street an actual tourist would see, while the guide bot uses a standard 2D map with generic waypoints—”bank,” “coffee shop,” “deli”—to deliver navigation instructions.