How I Beat IBM's Watson at Jeopardy (3 Times!)

Tonight is the opening match in what constitutes the Nerd Super Bowl--Ken Jennings (winner of 74 consecutive Jeopardy! episodes) versus Brad Rutter (winner of several tournaments of Jeopardy! champions) versus IBM’s natural-language processing prodigy, Watson. The last time an IBM supercomputer challenged a human opponent to a televised duel, chess champion Garry Kasparov resigned his final match in tears.

But Watson can be beaten--I know, because I’ve done it, thrashing him three times in top-secret sparring rounds against former Jeopardy! champions held a year ago, emerging with an unbeaten record. In doing so, I also created a blueprint for Jennings and Rutter to follow, one described by Stephen Baker in his new book Final Jeopardy: Man vs. Machine and the Quest to Know Everything, and followed by Baker himself when he finally battled Watson … and lost. My strategy was simply to take Watson’s strengths away from him. Having no idea what those strengths were, however, I had to make several assumptions.

First, I assumed he’d be impossible to beat on the buzzer, which had never been my strong suit, anyway. Instead, I took a page from The Princess Bride (the book, not the movie), specifically Inigo Montoya’s duel against the Man in Black. As long as Montoya was able to keep the fight on rocky terrain, his defensive prowess awarded him the advantage. Once the Man in Black maneuvered him onto open ground, however, Montoya’s was overwhelmed by his speed. So it would go with Watson, I figured. Binary relationships--countries and their capitals, for instance--would be easy for him to figure out, and he would beat me to the buzz every time. So I had to steer him into categories full of what I called "semantic difficulty"--where the clues’ wordplay would trip him up. I would have to outthink him.

Second, I would need to find and win the Daily Doubles to deny Watson a coup de grace and to keep pace in what I figured would be a losing war of attrition. (This was based on personal experience--I had rallied from last place to win my first Jeopardy! match only after a Daily Double on the very last clue.)

Finally, I had to be in the lead heading into Final Jeopardy. If Watson could confidently decide on an answer in only three seconds, I shuddered to think how infallible he would be given all of thirty.

It turned out Watson wasn’t unbeatable on the buzzer. Part of this had to do with programmed hesitation; if he wasn’t reasonably sure of the answer (i.e. they failed to reach a certain confidence threshold) he wouldn’t buzz. In a few cases, he knew the answer, but not in time; in others I managed to outrace him. But his real advantage on the buzzer was that he was consistent--unlike me, he never struggled to get the timing right. (Buzz too early, you see, and you’re locked out for a split second.)

But my hunch was right about his semantic difficulties. As Baker described in Final Jeopardy:

He figured Watson would clean up on Name that Continent, picking out the right landmasses for Estado de Matto Grosso (“What is South America?”) and the Filchner Ice Shelf (“What is Antarctica?”). The category Superheroes Names through Pictures looked much more friendly to humans. Sure enough, Watson was bewildered by clues such as “X marks the spot, man, when this guy opens his peeper” (“What is cyclops?”). Band Names also posed problems for Watson because the clues, like this one, were so murky: “The soul of a deceased person, thankful to someone for arranging his burial” (“What is the Grateful Dead?”). If the clue had included the lead guitarist Jerry Garcia or a famous song by the band, Watson could have identified it in an instant. But clues based on allusions, not facts, left it vulnerable.

In our first match, Watson quickly dug himself a hole several thousand dollars into the red. After a building a decent lead, I landed on my first Daily Double, wagering everything. “I have to kick him while he’s down,” I explained to our Alex Trebek fill-in. I answered correctly ("What are the gondoliers?") and was off from them. As Baker also describes in his book, this was clearly not one of Watson’s better days. Trailing me at one point $12,400 to $6,700, he landed on a Daily Double and wagered $5. Not $5,000. Five dollars.

Entering Final Jeopardy with the lead, I was convinced the match hinged on this one last question--Watson would leave me no margin for error. In the category “20th Century People,” the clue said: "The July 1, 1946, cover of Time magazine featured him with the caption, 'All matter is speed and flame.'" My (correct) answer: "Who is Albert Einstein?" Watson had none. His subconscious spit out some gibberish answers ("Time 100" and "David Koresh") but nothing he felt confident enough to write in. After the first match of the morning, it was Humans: 1, Watson: 0. The previous week, we were told, he’d gone undefeated.

By the time I faced Watson again that afternoon, he’d wised up to Final Jeopardy clues, but something more alarming started happening: I was playing his game. The most obvious example was how he chose clues. Most human players start with the easiest, lowest-value clues and work their way down the board, not only because the clues become progressively harder, but also because they teach you what to look for. Not realizing this, or caring, Watson would start a new category with the $1,000 or $2,000 clue first, which often left us struggling to grasp the subject. It also dealt a psychological blow when he got them right--his lead was either that much larger or he had gained significant ground on us, and we were running out questions to make up the difference.

Trailing Watson significantly at the start of Double Jeopardy in my second match, I found myself doing the exact same thing. As I was in last place, I had my pick of the board, choosing the $2,000 clue under “Nonfiction.” (Naturally.) Daily Double! I wagered all I had, only $3,000. "A 2009 biography of this builder of Grand Central Terminal calls him 'the first tycoon.'"

"Who is Cornelius Vanderbilt?" Correct! I rallied to win the game.

But the most unnatural behavior of all came in my third and final match, when, trailing significantly in Double Jeopardy, I remembered a Daily Double was still left on the board. The first had appeared in a $2,000 clue, which meant that the other was unlikely to appear in that denomination as well. Increasingly desperate, I took to sweeping the board back and forth, hop-scotching between categories until I found it, once again wagering all (on a clue about the first woman to be promoted to the U.S. Army rank of four-star general), and again holding on in Final Jeopardy to win.

So that’s my strategy, and it worked for me. But Jennings and Rutter will be facing a much tougher opponent--you would think Moore’s Law alone would have made Watson that much tougher by now--and he, in turn, will be battling much better players than I. But I had one other advantage they won’t--I felt no pressure to win, let alone on national television with a $1 million first prize on the line. The computer doesn’t feel it either, and this time around, that gives him the edge.

My money’s on Watson (although just how well he buzzes with his new electro-mechanical "hand" is anyone's guess). I just hope the humans don’t cry this time. That would be so like us.

Add New Comment

4 Comments

  • Greg Lindsay

    Geoffrey, did you see the look on Jennings' face tonight when he realized he had lost? In that moment, I think "coup de grace" was appropriate. Watson put both players out of their misery tonight.

  • Jym Allyn

    "Coup deGrace" is appropriate. I think he was the manager of SkyNet working for Miles Dyson who turned control of the nuclear weapons over to the robots.

  • Geoffrey Beresford Hartwell

    May I ask whether the use of "coup de grace" is entirely appropriate here? One meaning used to be the shot fired from the officer's revolver in case a firing squad had not succeeded in killing the victim; another was the shot fired - or the sword stroke used - to kill a fatally or incurably wounded man found lying after a battle.