Alexa and Siri are good for some things, but their help only goes so far. You can order a new shirt or arrange a cleaning service through them, but you can’t ask them to wash the top you wore yesterday or put your favorite sheets on the bed. That would require a robot–with a much, much smarter brain.
Now, a group of researchers at MIT’s Computer Science and Artificial Intelligence Laboratory have created the equivalent of an Alexa for a physical robot. The program, with the decidedly less cute name of ComText, enables humans to give robots commands using natural language. The team of researchers tested ComText with an off-the-shelf robot called Baxter, which is mainly used in warehouses and factories, but the program could also be applied to other kinds of robots as well.
Teaching robots natural language commands, especially those that reference the physical world in any way, is much harder than it sounds. Robots don’t possess any sense of context. If you place a tool on the table and ask a robot to “pick it up,” the bot won’t know what the “it” is. If you ask it to pick up the last tool you were using, it similarly wouldn’t have the capacity to associate a previous experience with this new command.
But ComText, short for Commands in Context, essentially gives the robot what psychologists call declarative or explicit memory in humans: the ability to recall facts and events that have occurred. There are two types of declarative memory. One, called semantic memory, is fact-based–that your birthday is this date, or you work at this company.
Then there’s episodic memory, which is based on experiences in the past that inform decisions that are made in the future. In other words, it’s context–and that’s what ComText enables robots to perform. With this system, if you ask the bot to pick up the last tool you were using, it can quite literally go back through its digital memory (in the form of a video feed) to find the previous instance of you using a tool, identify that tool again in the real world, and pick it up.
Humans use episodic memory in natural language processing all the time, even at a young age. If you tell a young child, “the cup is mine,” and then ask them, “pick up my cup,” they know to associate the cup on the table with you, and to pick it up when you reference it. But that requires a complex association that’s difficult for robots. “‘Mine’ is an abstract relationship,” says Rohan Paul, a postdoc at MIT who was a lead author on a paper about ComText. “You can’t build a detector of it. There’s not a physical manifestation.”
But ComText can also perform this type of action, where a “fact” like “the cup is mine” is stored in what Paul called its “knowledge drawer.” Then, when you later ask it to “pick up my cup,” the robot can reference that database to correctly identify which cup is yours.
ComText is a step forward in human-robot interaction and could enable people to communicate naturally with robots using references to previous occurrences and abstract concepts like ownership. If there were a robot enabled with ComText in your grandmother’s home, for instance, it would be able to understand when she asks it to pick up the package she knocked to the floor or bring her favorite sweater.
“These two contributions come together to allow the robot to significantly expand the kinds of commands we can ask the robot to perform in the physical world,” Paul says. When testing with Baxter and ComText, the robot executed the command accurately 90% of the time.
This is important because humans and robots are interacting more and more–in factories, in the home, and increasingly on the road. In fact, the research was partially funded by Toyota. Contextual memory would be essential for communication with an autonomous vehicle, enabling you to say things like, “pick me up from the same place you left me yesterday,” “pick up my wife at 5:00 p.m. from her office,” and “turn left where you saw that pedestrian walk.” All of these instances require reasoning about interactions in the physical world based on contextual information.
Next up for ComText are higher-level inferences–and more complex tasks than picking up and putting down objects, which is all Baxter is designed to do. Paul says he hopes to add a speech component to the program so that the robot and human could have a conversation, asking each other questions to collaborate better. He also wants to add more knowledge to the robot’s memory to enable more complex tasks and deduction. For instance, if you tell the robot that there’s an aluminum block on the table and that it’s a conductor, the goal would be for the robot to be able to bring it to you if you later say, “bring me a conductor.” Combining more knowledge with contextual memory would help the program make better inferences about what a command actually means.
This is all in the name of humans and robots communicating more effectively. Maybe they won’t take our jobs–we’ll just work together.