It’s been over 200 years since the publication of Jane Austen’s first novel, Sense and Sensibility, in 1811, yet her work lives on in popular imagination—in film adaptations, fan fiction, and countless reissues. What’s behind the impressive pervasiveness of Austen’s novels—books that have not only taken their rightful places in the literary canon but still live on in popular movies and on the bookshelves of everyday people? And is it a phenomenon that can be quantified and understood analytically?
In the age of big data, it’s at least worth a try. At the New York Times, graphics editor Josh Katz and Upshot editor Kathleen A. Flynn—Flyyn is also the author of the book The Jane Austin Project—wanted to see if they could pinpoint the qualities of Austen’s books that have made them so popular with readers throughout the decades. To do so, they plotted out the words used most frequently in Austen’s novels when compared to 126 other British narrative fiction novels published between 1710 and 1920.
The vocabulary in Austen’s novels, as it turns out, was quite different from that of her British contemporaries. This won’t surprise Austen fans; set in landscapes Austen knew well and populated with ordinary characters, her books were works of social realism at a time when mysteries, Gothic fictions, and sensational romance were all the rage. Katz and Flynn’s graph reflects that, with all of Austen’s novels closest to the top of the vertical axis representing words about emotion and time (really, very, awkward, fortnight), and the left side of the horizontal axis that represents more abstract words (virtue, resolved, affection, gratitude). They sit far away from novels like Rudyard Kipling’s The Jungle Book and H. G. Wells’ The War of The Worlds, that are close to the right horizontal axis representing more physical words (edge, fingers, slipped), as well as novels like Ivanhoe by Sir Walter Scott that gather near the bottom vertical axis for Medieval words (banquet, girdle, death).
The position of each book on the graph is determined by how often each word in the English language appears in it. Based on the most popular words in each book, the authors grouped it into one of the four categories. Shaded areas show how far these categories extend on the graph, and where they overlap. For instance, Wuthering Heights, Emily Brontë’s classic story of love and revenge, sits near the middle but skews slightly toward the top and right of the graph. That indicates that it contains words that are both physical and about emotions, though it also is still in the realm of medieval words, with their darker and more violent and dramatic qualities.
Compared to Wuthering Heights, Austen’s books appear as more modest dramas. But as Katz and Flynn’s written analysis of the data points out, just because Austen was subtler doesn’t mean she didn’t grapple with subjects like love, passion, and complex, contradictory characters. One crucial trait of her writing was the use of intensifiers—words like very, much, and so—that she used to imbue irony into her stories. Other times, she used words like could and must to show a disconnect between what her characters—who are mostly women—say and what they truly feel and think.
Austen was a master at showing the complexities of her characters with the slightest word choice, or sowing drama into an ordinary scenario with her keen observations. And her realism allowed for her novels to persist, even as literary trends come and go. One of the most striking things about the Times’ visualization is also the most obvious: all six of Jane Austen’s novels are clustered outside of the Venn diagram where other novels from over two centuries of British literature overlap. She truly was in a league of her own.