Facebook Is Teaching Machines To Make Small Talk

We’re all training machines all the time. Even when we’re shooting the shit.

Facebook Is Teaching Machines To Make Small Talk

One of these two speakers is a robot, and one is a human. Can you tell which is which?


[Dr. Jenson:] . . . I enjoy a nice steak. How about you?
[Miss Dawes:] Father always used to say that if we ever had the money you have . . . we’d have steak and ice cream three times every day!
[Dr. Jenson:] I eat all the steak and chicken too, even bacon.
[Miss Dawes:] Father always used to say that if we ever had the money you have . . . we’d have steak and ice cream three times every day!
[Dr. Jenson:] Yeah you said that already.
[Miss Dawes:] Dynamic structure can absorb shock.

Miss Dawes is a bot, and her response is typical of even the world’s best chatbots when they try to make chitchat. She gets stuck in a loop. She brings up seemingly random, unrelated topics. She doesn’t seem to have any anchoring values or interests, let alone a continuous voice.

So far, this hasn’t been a major problem. That’s because today, chatbots are used for functional tasks like booking restaurants or looking up the weather. But AI researchers at Facebook’s FAIR group want to develop intelligent chatbots that can interact with humans through truer, natural language. And to get there, they’re building a new, open source database of chit-chatting people–assembled from thousands of conversations it paid people to have with one another, and with machines, through Amazon’s Mechanical Turk.

You see, Miss Dawes–and her weird anecdote about her father’s preferences for steak and ice cream–represents top-of-the-line machine speak right now. She was trained to make small talk with one of the most comprehensive existing data sets on human discussion. It’s called OpenSubtitles, and it’s really just a collection of phrases that have been pulled from movies. Yet, obviously, the logic of movie dialogue–oft dramatic, oft terse–leaves some holes when it comes to teaching a chatbot to speak with a real human.


Facebook is trying a different approach. Instead of training its bot to ask meaningless open-ended questions (like “tell me how you feel about that”) or turning to databases built from film characters to fill its own verbal gaps, its researchers looked to real people, studying how individuals interact when they’re trying to learn about one another in a casual conversation.

Their hope is that this nuanced, distinctly human behavior can be digitized. That’s no small feat: Small talk is surprisingly subtle and a bit aimless by design, requiring an improvised choreography of verbal cues, synonyms, and ever-shifting context. Could it ever be believably automated?

In a new paper, the researchers describe their strategy. First, they asked Amazon Turkers to create 1,155 imaginary personas, each with its own five-sentence description. They read along the lines of, “I am an artist. I have four children. I recently got a cat. I enjoy walking for exercise. I love watching Game of Thrones.” Then, they had the Turkers rewrite these personas, essentially saying the same things in different words, to add a bit more linguistic variety. “My father worked for Ford” might become “my dad worked for the car industry.”

“Sometimes people are vague, sometimes they’re specific,” says Jason Weston, research scientist at FAIR, when I ask why they seemed to be building out their data set with both more specific and more generalized personal factoids at the same time. “A strong language understanding system has to be good at both and understand how they intersect. This is a challenging research problem, and as a result we have a data set that mirrors both realities.” Finally, they assigned these personas back to Turkers, essentially telling them to role play as the person in a unique conversation with another Turker playing a different persona.


What Facebook collected was almost 11,000 lines of true written conversation–people sharing a bit about themselves while asking a bit about themselves. Coupled with the personal descriptions, their small talk forms the basis for the so-called Persona-Chat data set, which Facebook has made available to other researchers here.

It works pretty well–though not quite perfectly. Take this example, shared by Facebook, of a Persona-Chat-trained-bot talking to a human.

[Person 1:] I just got done playing soccer. What have you been up to?
[Person 2:] I do not have pets. I spend my free time writing. what about you?
[Person 1:] Just graduated from college. Looking for work.
[Person 2:] yes I graduated, in writing and poetry
[Person 1:] Have you had anything published?
[Person 2:] I mainly write fiction novels.
[Person 1:] Do you like to watch sports?
[Person 2:] do you like kings of leon my favorite by them is use somebody
[Person 1:] Are you married? I will be married soon.
[Person 2:] haha, no time. I have got a novel to finish.
[Person 1:] What is your novel about?
[Person 2:] it is a love story. I really enjoy writing. do you write or read?
[Person 1:] Like to read. Spend most time watching

Person 2 is still clearly not a person, but a few of their lines are absolute, colloquial perfection. In Facebook’s testing, a Persona-Chat-trained bot outperformed an OpenSubtitles-trained bot in every metric, including a human’s ability to understand it, engage with it, and be fooled into thinking it was a fellow human. And furthermore, Facebook researchers admitted to me that their AI itself wasn’t the real breakthrough here; it was the new data set behind it.


Indeed, if Facebook can best other data sets with its 11,000 lines of dialog, it’s enough to make you wonder just how much about language Facebook is learning from the conversations that 2 billion people have on its service every month–perhaps with help from the “personas” gleaned from our constant likes, shares, and joined groups–and how convincingly a chatbot imbued with that much data could make small talk as a result. Of course, Facebook maintains that it wants to connect people to one another, not to a swarm of shit-shooting chatbots.

Yet it’s a salient reminder that, whether it’s posting a status update on Facebook or just typing a search into Google, we’re all training machines all the time. And as Father always used to say, if we ever had the money you have . . we’d have steak and ice cream three times every day!

About the author

Mark Wilson is a senior writer at Fast Company who has written about design, technology, and culture for almost 15 years. His work has appeared at Gizmodo, Kotaku, PopMech, PopSci, Esquire, American Photo and Lucky Peach