This article is part of Fast Company’s editorial series The New Rules of AI. More than 60 years into the era of artificial intelligence, the world’s largest technology companies are just beginning to crack open what’s possible with AI—and grapple with how it might change our future. Click here to read all the stories in the series.
Back in 2015, chatbots were big. And one of the most hyped ones was Facebook’s M, which the company meant to be a flexible, general-purpose bot that could do lots of different things such as purchase items, arrange gift deliveries, reserve restaurant tables, and plan travel. But the buzz was far bigger than the bot. When Facebook tested M with a group of 2,500 people in the Bay Area, the software failed to carry out most of the tasks it was asked to do.
After the initial burst of enthusiasm for M and other chatbots (“bots are the new apps,” Microsoft CEO Satya Nadella proclaimed), a wave of disappointment followed. Chatbots weren’t chatty so much as robotic. That’s because they were trained to talk about only a narrow set of things, and to carry out specific, limited tasks. They weren’t able to have a natural conversation with people, generating their own responses based on a general understanding of words and their meanings. They could only deliver generic replies.
Before M ever left beta, Facebook curtailed its grand plan for the bot, though some of its natural-language technology found its way into far less daring Messenger chatbots that could do single, simple things like take food orders or give out Q&A information. Companies like American Express and 1-800-FLOWERS still use this sort of low-ambition chatbot to answer customer service questions, take basic orders, and provide account balances. Many will connect you to a human representative if you ask them anything outside of their limited understanding.
But Facebook’s AI research group has moved on from that sort of single-minded chatbot. “What we’ve been saying over the past three to four years is that research on goal-oriented dialogue is not the path we need to explore, because it’s too hard, the stakes are too high,” Facebook natural language researcher Antoine Bordes told me. If a travel chatbot books “the wrong plane, the wrong flight, that’s a very big mistake in terms of money, in terms of travel, etc.,” he says.
Instead of focusing on the mechanics of specific tasks, Bordes explains, Facebook is stepping back to tackle a deeper problem—teaching virtual agents to converse like people. If chatbots can understand and communicate with humans better, the thinking goes, they might eventually make better assistants that can help people accomplish practical tasks such as booking plane tickets.
Facebook has been investing seriously in this effort, hiring some of the best talent in natural language AI. The company likes to point out that unlike some big tech companies, it makes the results of its AI research available to the entire research community by posting it online, where it might help others who are building next-generation AI. But this research will surely end up in its own products, too.
Messaging apps are a natural home, including both Messenger and WhatsApp, the latter of which Facebook also owns and is still figuring out how to monetize. With CEO Mark Zuckerberg talking up a new vision for the company with a higher emphasis on private conversations, Messenger and WhatsApp will need adding features to maintain their lead over other messaging platforms such as WeChat, Telegram, and Apple’s iMessage.
Building an algorithm that can casually chat with a person has become a key goal for big tech companies, with Amazon, Google, and Microsoft all joining Facebook in betting on the power of human conversation—not just within text-based messaging apps, but also for voice assistants and other experiences. Thanks to recent research inroads, the path to the truly conversational computer has suddenly become clearer, but the prize of getting there first is still up for grabs.
In other words, Facebook’s natural-language research is about far more than just resurrecting M or improving Messenger-based chatbots. It’s about the future of the entire company.
Enter the neural network
Building a digital agent that can have a lifelike conversation with a person is arguably the hardest of all natural language problems. It requires a machine to learn a dictionary full of words, with all their usages and nuances, and then use them in live conversation with an unpredictable human.
Only in the last few years has the natural language AI community started making larger strides toward general-knowledge bots. That’s partly because of big advances in neural networks, which are machine learning algorithms that recognize patterns by analyzing huge amounts of data.
For most of AI’s history, human beings have watched over software as it goes through the machine-learning process. In a technique called supervised learning, the human teacher slowly trains the neural net over time by providing the correct answer to a problem, then adjusting the algorithm so it reaches the same solution.
Supervised learning can work well when there’s lots of data that’s all been painstakingly labeled—say, by identifying photographs that have cats, dogs, or other items in them. But this approach often doesn’t work in the chatbot world. Labeled transcripts from thousands of hours of actual human-to-human conversations are hard to find in large amounts, and are costly for a single company to create.
Because it is so difficult to teach chatbots how to have conversations using these older methods, researchers have been looking for alternatives to supervised learning that let neural networks learn from data on their own, without a human in the loop.
One way to cut down the need for training data is to teach the machine a base level of common sense. If a computer has some understanding of the world—like the relative size of objects, how people use them, and some knowledge of how the laws of physics impact them—it might be able to narrow its choices to only those within the realm of possibility.
People do this naturally. For instance, let’s say you are driving a car next to a steep cliff and suddenly see a large rock on the road ahead. You want to avoid hitting the rock. But when considering your options, you would never decide to make a sudden hard turn toward the cliff side. You know the car would fall violently to the rocks below, because of gravity.
“The majority of the learning that we do . . . is by this observation of the world,” says Facebook VP and chief AI scientist Yann LeCun, a legend in the AI field who’s been tackling its biggest challenges since the 1980s. “We learn a lot of things from our parents and others, but we’re also learning a lot just by interacting with the world, through trying and failing and correcting.”
AI that’s trained using this technique, called unsupervised learning, works the same way. An autonomous car, for instance, collects data about the world through its many sensors and cameras, like a child learns about the world through her five senses. With this approach, scientists provide the machine with large amounts of training data to chew over. They don’t ask it to generate a right answer or coax it toward a certain goal. Instead, they ask it only to process and learn from the data, to find patterns, and map relationships between diverse data points.
In many cases, this necessary data is hard to come by. But there is one area of AI where the neural network can learn a lot about the world without the need for sensors: natural-language processing. Researchers can use vast amounts of existing text to help algorithms understand the human world—which is a necessary part of understanding language.
Let’s say a neural network is given these two phrases to comprehend:
- “The trophy doesn’t fit in the suitcase because it’s too large.”
- “The trophy doesn’t fit in the suitcase because it’s too small.”
To know that the “it” refers to different things in each sentence, the model needs to know something about objects in the world and their relation to each other. “There’s enough structure in the text they’re being trained on to know that when you have an object that fits into another one, then one of them can’t fit if it’s too big,” says LeCun.
This technique may turn out to be the secret to a new generation of more conversational and useful Facebook chatbots.
Meet BERT and RoBERTa
The current advances in unsupervised training of natural-language systems started out at Google in 2018. Its researchers created a deep learning model, called BERT (Bidirectional Encoder Representations from Transformers), and gave it the unannotated text from 11,038 books along with 2.5 billion words from English-language Wikipedia entries. The researchers randomly masked certain words in the text, and challenged the model to work out how to fill them in.
After the neural network analyzed the entire training text, it found patterns of words and sentences that often appeared in the same context, helping it understand the basic relationships between words. And since words are representations of objects or concepts in the real world, the model learned more than just linguistic relationships between words: It began to comprehend how objects relate to each other.
BERT wasn’t the first model to use an unsupervised approach to train a computer to understand human language. But it was the first to learn the meaning of a word within its context.
“I would say it’s among the top two or three big breakthroughs in natural language processing,” says Jianfeng Gao, a partner research manager in Microsoft Research’s Deep Learning Group. “You see people using the model as a new baseline for building all other natural language processing models.” So far, the BERT research paper has more than 1,000 scholarly citations as other researchers have built on Google’s model.
LeCun and his team are among them. They built their own version of the model, then made some optimization tweaks, expanded the amount of training data considerably, and increased the training time allowed. After the neural net had run billions of computations, Facebook’s language model, called RoBERTa, performed considerably better than Google’s model. It demonstrated an 88.5% accuracy level compared to BERT, which scored 80.5%.
BERT and RoBERTa represent a radically new approach to teaching computers how to converse. “In the process of doing this, the system has to represent the meaning of the words that it sees, the structure of the sentence, the context,” says LeCun. “As a result, it kind of learns what language is all about, which is weird because it doesn’t know anything about the physical reality of the world. It doesn’t have vision, it doesn’t have hearing, it doesn’t have anything.” All it knows is language—letters, words, and sentences.
Inching closer to a real conversation
LeCun says a natural language model trained using BERT or RoBERTa still doesn’t have a ton of common sense—just enough of it to begin generating chat responses that are based on a broad base of general knowledge. It’s really just the beginning of the process to train an algorithm to talk like a person.
Facebook’s natural language researchers are also trying to build more features of conversation on top of RoBERTa’s foundation. They started by studying actual human conversations with chatbots to understand how and when conversation can break down or get boring. Their findings have driven research that proposes ways of training a bot to avoid the most common kinds of conversation failures.
For instance, chatbots often contradict themselves because they don’t recall what they’ve already said in a conversation. A chatbot might proclaim its love for Knight Rider reruns one minute and say it dislikes TV shows the next. Chatbots that create their own original responses (instead of retrieving examples from training data) have a tendency to answer questions in vague ways to avoid making errors. They often seem emotionally tone-deaf, which makes them less engaging.
Chatbots also have to be able to call on knowledge to be interesting conversationalists. Ones that can draw on a diverse range of information are far more likely to sustain longer conversations with humans. But current chatbots are trained with knowledge from a single area that corresponds with the task the bot is designed to do—which becomes a problem when human beings start making comments about subjects that are outside the bot’s domain. Ask a pizza-delivery bot about any subject other than pizza, for instance, and the conversation will quickly devolve.
As an antidote, Facebook’s researchers have been working to train natural language models to draw data from many knowledge domains and work that information into the conversation flow in natural ways. Future research will focus on teaching bots when and how to steer the conversation from a general topic back toward a specific task.
One of the biggest challenges of developing chatbots is enabling them to keep learning after they’ve been deployed. The meanings of words can change over time, and new terms and slang become culturally important. At the same time, a chatbot can’t be too suggestible—Microsoft’s Tay chatbot learned too much too soon from its online conversations and turned into an insulting racist within 24 hours. Facebook is teaching its experimental chatbots to learn from the times when conversations go well, and to analyze the language of a human chat partner to discover if a bot has said something dumb or boring.
It would be dangerous to predict when the advances Facebook is making in the lab might result in messaging chatbots that can engage in a dialogue with even superficial human-like skill. But it might not be that long until we can judge the results for ourselves. “We believe that we are very close to having a bot where people can talk to the bot and see value in it,” Facebook researcher Jason Weston told me.