Facebook may be the subject of an Oscar-contending movie, but it has yet to become the subject of great literature. Nonetheless, though the social networking site is relatively new, social networks themselves are not. Indeed, they're as old as society itself, and as such, social networks are reflected in works of literature. The thought occurred a few years ago to a Columbia University PhD candidate in computational linguistics, David Elson. Also a film buff and lover of stories, Elson wondered if there might be a way to use software to map social networks within novels.
"I started thinking about how storytelling works as a language, with a syntax that we pick up as children and that differs from culture to culture," Elson recently told The Columbia Record. "We are starting to be able to write programs that can learn a language by reading a lot of it—why can’t a program learn the meanings of stories the same way?"
In 2009, Elson developed a program that would scan through digital copies of novels and map social networks between characters, focusing on dialogue. It did this in three steps. First, it used a "name entity tagger" that a Stanford group had developed to list all the names in the novel. Then the program assigns a speaker to each instance of spoken dialogue, using contextual cues. Finally it assembles a graph of networked bubbles, like the one shown here. The bigger the bubble, the more loquacious the character. The darker the connecting line, the firmer the relationship (i.e., the more the two characters talked to one another).
Of course, if two mute characters meet, fall in love, marry, and have children, they won't be reflected on the map. Terse characters in general will be handicapped here. Two dueling divorce lawyers would appear to have a more significant relationship than the couple that's divorcing. But these are extreme examples. Assuming that dialogue is a good-enough proxy for actual social interaction in most novels, the method is intriguing.
And useful. Elson and his advisor, Kathleen McKeown, decided to team up with Nicholas Dames, a scholar of Victorian fiction at Columbia. They set out to test a hypothesis. The existing consensus about English novels had been that social networks in urban fiction differed from that of fiction in a rural setting. Dames told The Record about the theories of one Cambridge professor who had "developed a series of extremely persuasive arguments about how, in the 19th century, novelists began to imagine urban social interaction as fundamentally different—more dispersed, accidental and fleeting—than the kinds of social interactions found in village or rural settings....His arguments were based on very elegant readings of a select few authors—most notably Austen and Dickens—and they quickly became fairly standard, canonical theories."
But a few anecdotes don't make data. When the Columbia team tested this theory against 60 scanned 19th century novels—10 million words written by Charles Dickens, Jane Austen, Charlotte and Emily Bronte, Arthur Conan Doyle, George Eliot, Anthony Trollope, and Thomas Hardy—the theory simply didn't hold up. The networks in urban novels were more cohesive and interconnected. In fact, the form of the novel (whether it was told in the first- or third-person) was a stronger predictor of a novel's social network than its setting.
Elson presented the paper at the annual conference of the Association for Computational Linguistics in Uppsala, Sweden. It won for best student paper. When he graduates in May, he's headed for a job at Google.
Follow Fast Company on Twitter.
[Illustration: Andrew Hur]