The next guest in our futurist series, Crystal Ballin’, is Seth Redmore, VP of Product Management at Lexalytics, a major player in the realm of text and sentiment analysis. Want to keep track of what people are saying about your brand on the Internet? Lexalytics’ engine, “Salience,” can help you do that. But despite initial successes, we’re only at the beginning of what text analysis might achieve in the future–as Redmore points out, a computer can easily confuse a harmless family anecdote with animal pornography. What follows is an edited transcript of our talk.
FAST COMPANY: Before we talk about the future of text analysis, let’s talk about its present. What is text analysis?
SETH REDMORE: It’s very simple: you give us a piece of text, and we tell you who’s being discussed, the context of the conversation, and is it positive or negative.
“Sentiment analysis” seems complicated. Isn’t it easy for a computer to confuse what’s negative and positive?
One really good example was that after the oil spill, BP was getting slammed, Louisiana was getting slammed, and so on. But I saw that the term “oil spill” had a bunch of green on it [an indicator of positive sentiment]. I was like, WTF? But when you looked at the content, the way it was being described was that it was the “biggest,” “largest” spill–not “worst.” That’s a hard thing to wrap your head around. I as a human know an oil spill is bad, but when a machine is interpreting it, from the perspective of the oil spill, things are good.
If you had the oil spill itself as a client, it would have been thrilled by what people were saying about it.
Exactly. And it was one of those things where we asked, should we hard-code this as wrong? My attitude was no, that that’s the exact sort of thing where we should see how language was actually being used.
When do we finally get to the point where a machine is smart enough to stop making mistakes like that?
There’s a couple parts to that question. One has to do with different interpretations of things. We rely on capitalizations and pauses and commas to make interpretations. There’s a classic dirty example of this. Let’s say you have a disabled Uncle Jack, and you say, “I have to help my Uncle Jack off a horse.” It means something very different if you don’t have the capitals. Another is that humor and sarcasm is very difficult for machines. What if you just see the tweet, “I love the Apple Store.” Prima facie, the dude loves the Apple Store. But if the last three tweets were that he stood in line for four hours, that the “genius” was an idiot, and that he couldn’t check out, then that was sarcasm. But you don’t know that unless you saw the previous tweets.
I’m still trying to figure out what my disabled uncle would be doing on a horse.
Well, you’d help him on the horse as well.
He should probably just stay at home.
He’s just missing a leg. It’s okay for him to ride the horse.
So you were suggesting that people use language differently in the age of Twitter and Facebook.
I’m wondering if we’ll see language changing rapidly because of the influence of texting and Twitter. For the first time in history, the digit with the most dexterity is now the thumb, not the index finger. Not for my generation–I’m 40–but for a generation or two back of me. This is affecting how language develops.
This is all because of texting?
Texting and video games, since video games so heavily use thumbs.
One thing that’s said of an HTML5-enabled Internet is that it’s going to be less dependent on text, more dependent on multimedia. How do you prepare for that eventuality?
There’s a certain amount of discussion about this right now. For the near term, most of us are focusing on how do we get from the speech to the text, and are there emotional cues. It’s hard enough for me to deal with English, just plain written English as it is, than what’s that person’s body language. I think what’s more likely is for us to start to improve the ability to work better between languages, and to teach computers about jokes, sarcasm, and sayings that don’t necessarily translate well.
How do you teach a computer humor, or sarcasm?
Sarcasm is a little easier than humor. Sarcasm, you look at the context. If you have a context that’s very negative, and then suddenly something’s really positive, there’s a good chance that’s sarcasm. With humor, there’s many different kinds of humor. The kind most amenable to machine interpretation is humor that relies on ambiguity. So, a woman walks into a clothing shop and she says, “Hi, can I try on that dress in the window?” and the salesman says, “Why don’t you try it on in the dressing room?” A computer can see that there are multiple ways to interpret “in the window.” There’s research going on with getting computers to tell jokes, actually.
It’s time to get crystal ballin’. Where do you see text analytics one year from now?
I think we’ll see more work like the work we’ve done on Wikipedia, to better determine conceptually what people are saying. What we did from a literal, technical perspective is we took Wikipedia and extracted the important single words, and two-word phrases, and then we used the links present inside of the content to relate them to each other. I think we’ll see others using Wikipedia as a lexical resource, because it’s a distillation of human knowledge, which is really important from an artificial intelligence standpoint.
Where do you see text analytics five years from now?
Five years from now, I think we’ll have some reliable indicators of sarcasm. One of the things that’s going to change in next five years is how people are talking. Folks use Twitter and Facebook as a soapbox. Companies and governments are watching this. Does that then naturally start to change how people use it? Do they change the language that they’re using? Does it become a required class in college perhaps? “How Not to Be Stupid on Twitter.”
How about in 20 years?
In 20 years we’ll have enough processing horsepower, since Moore’s Law is not showing signs of slowing down, that you’ll be able to carry around a system with you. Smartphones are pretty damn smart now, but if you combine that with a system that’s monitoring you, your facial expressions, your body temperature, your heart rate, along with who you’re about to call, it could give you suggestions as to what you might say. Your biometric markers indicate the emotional state you’re in; they may indicate that you’re tipsy or pissed off or happy or in a good mood. “OK, you’re relaxed, now call your mother because you won’t get in fight.” Or, “You’re drunk, don’t call your ex.” Or it could say, “This person sent an email,” and it could start to compose your response. We’re starting to see the real beginnings of intelligence in these devices that can help you do things, not just find things.
Follow Fast Company on Twitter.
[Homepage image: Flickr user Michael Heilemann]