Data-mining didn’t exist in William Shakespeare’s era. Even today, academics and researchers seem uncomfortable using data-mining techniques to research the liberal arts. The same techniques used by businesses to analyze web content and by marketers to target audiences, however, have big ramifications for Shakespeare–and have helped settle long-standing academic arguments.
In a late October presentation at the Folger Shakespeare Library, Library director Michael Witmore described his use of innovative data-mining methods to analyze Shakespeare’s First Folio. The event was subsequently repackaged as a free podcast, and the ramifications are fascinating. By processing excerpts from the First Folio through word-analysis software, proof was found that Shakespeare’s vocabulary and syntax varied wildly between his comedies, historical plays, and tragedies. More importantly, software analysis seems to prove that Othello–despite being a tragedy–was intentionally written with comedic stylistic cues that served to intensify the play’s tragic aspects.
Witmore processed 767 different thousand-word excerpts of plays from the First Folio through a piece of software called DocuScope. DocuScope is a rhetorical analysis tool based on the Oxford English Dictionary that takes a database of 40 million English lingustic patterns and sorts them into more than 100 categories. These categories are rather dry–typical classifications include “Positive Emotion,” “Directives,” and “Narrative Verbs”–but they do the job. DocuScope is one of the few linguistic analysis tools geared toward literature that has been made available; however, you’ll have a hard time getting a copy. The product’s owners, Carnegie Mellon University, restrict public access.
Filtering Shakespeare’s classics through DocuScope led to some unusual discoveries. As Witmore put it, the process was like “taking 36 decks of cards filled with random content… and then asking why there were no sevens in the decks that contained red cards.” Data-mining and computer-led textual analysis uncovered patterns in Shakespeare’s work that a human observer, trained in traditional academic reading methods, would never see. Such as the fact that–in purely linguistic terms–Othello is a comedy.
“Comedy” in Shakespearean terms is quite different from our own conception of the genre. For the purposes of Shakespeare scholars (and English audiences of the Elizabethian era), comedies were considered to be plays that ended in weddings or which contained characters from multiple social strata. According to Witmore’s analysis, the “worm’s-eye view” provided by data-mining discovered that Othello was unusually rich in vocabulary usually only found in Shakespeare’s comedic plays. In addition, data-mining analysis discovered previously unknown recyclings of aspects of Twelfth Night in Othello.
While computer analysis of Shakespeare’s work is a fascinating hobby, Witmore has also published academic literature on the subject. Witmore wrote extensively on his first efforts in data-mining Shakespeare for academic journal Early Modern Literary Studies in 2004. Other academic work conducted by Witmore has been republished on his personal blog. Before joining the Folger Shakespeare Library, Witmore was based at the University of Wisconsin–one of the world’s hot spots for the “digital humanities” movement, which applies web data analysis and research methods to traditional liberal arts research.
As for Shakespeare’s words and writing styles being distinct depending on whether he was writing comedies, tragedies, or historical epics? DocuScope analysis indicates the funniest thing Shakespeare ever wrote was a portion of The Merry Wives of Windsor, while a passage from Richard II was the most serious.