Big Data On Campus Is Like A Keg Stand For Your Brain

Take a crash course in digital humanities with McGill professor Stefan Sinclair.

Big Data On Campus Is Like A Keg Stand For Your Brain


In the burgeoning academic discipline of digital humanities, creating software tools is as important as getting published in a journal. To better understand what this means, take a peek at the pedagogical playbook of Stefan Sinclair, associate professor of digital humanities in McGill University’s Department of Languages, Literatures and Cultures. Sinclair is a new kind of lit professor (as is this guy) who’s bringing a humanist’s sensibility to computing–and leveraging Big Data methods to ask new kinds of questions about literature. At the same time, he’s equipping a new generation of humanities students with the eclectic skill set and entrepreneurial spirit to take on a 21st century job market. They’re going to need it.

FAST COMPANY: What is “digital humanities,” exactly?

STEFAN SINCLAIR: There’s a natural tendency to assume it’s a new field, but it’s actually been around for quite a long time. The first research combining computers and the humanities was in the 1940s, and a journal called Computers and Humanities started publishing in the 1960s. But there has been a lot of attention and momentum in the past 3 or 4 years that hasn’t been there before. The core of digital humanities is the critical exploration of how computers and technology can enhance but also influence our modes of research in traditional humanities. My own work has been focused on facilitating the exploration and analysis of digital texts. Part of that is to provide tools to allow people fairly easily to ask questions about things like the frequency of terms or clusters of terms in a document or body of work, how those terms are distributed, and which terms and themes are most distinctive to individual documents in a larger body of work.

I’m also interested in how to design and implement tools such that the computer can suggest some possibly interesting trajectories to follow. In other words, the user/reader doesn’t necessarily need to know what questions to ask in advance–the computer proposes some. That’s could be the sweet spot of text analysis in the digital humanities.

How does this kind of approach help us see things that we couldn’t before?

One thing that’s compelling about digital humanities is being able to ask questions at a scale you can’t ask without computers. Really, most humanities is very exclusionary–we don’t have time as humans to read a lot of text. So all English studies are a matter of excluding, choosing texts we’re interested in and leaving aside others. With computers, we can now ask questions of, say, all novels in the 19th century. Sometimes that’s called “distant reading”–as opposed to the more traditional literary practice of close reading. You can also combine close and distant reading, when you want to look at a few novels, but offer a comparison to a larger context of novels.


Digital humanities also encompasses a lot more than text. There is a lot of interest in game studies, for instance, and geospatial analysis that’s not what people in geography would do. An example of that is a project on the Republic of Letters–a long-distance intellectual community in the late 17th and 18th century in Europe and America–that maps the transferring of thoughts across geographical space, allowing you to visualize that and see things in generative ways.

What are some other good examples of digital humanities research?

Personally, I’m fond of digital humanities questions that allow us as humanists to leverage digital technologies to engage with issues of broader public relevance. A good example of this might be research that “forecast” the Arab Spring revolutions using newspaper analysis, though I’d also have to recognize that the humanistic perspectives are a bit overwhelmed by the scientific methodologies. Other examples that navigate between humanistic and scientific perspectives and methodologies include using text analysis to study the onset of dementia in the works of Agatha Christie;
using authorship attribution techniques to determine who wrote Primary Colors or how many authors may have contributed to certain books in the Bible; using n-grams, or repeating sequences of terms to measure a cultural phenomena expressed through print, such as the “half-life” of famous people; or comparing gendered vocabulary in toy ads.


Is there an ideal balance between “digital” and “humanities”?

I like the axiom that if you’re holding a hammer, everything starts to look like a nail. One of the core tensions is because we’re using computers, there’s a tendency to go to the scientific perspective of trying to prove something. That’s counter to the humanities approach, which not to converge on singular answers, but instead to proliferate the number of interpretations we have. I am particularly passionate about tools and methodologies that allow for the proliferation of perspectives–not to prove a hypothesis I have, but to see a text differently and ask different questions.

Is digital humanities also moving toward studying texts that are purely digital–say, what people are writing on social media platforms?

It’s interesting to think about how someone with literary training can approach a large Twitter feed–what they could say about that as opposed to a social scientist. That’s the very thing I’m more and more drawn to, using tools we’ve developed and a literary sensibility to read deep cultural phenomena that go beyond the bound book to born-digital texts. I don’t yet have very good answers to what that looks like–it requires more attention and investment.


You mentioned that your focus has been on building tools–tell me about some of those projects.

Voyant is a tool for online text analysis. In developing this, we erred toward simplicity over power and flexibility. You just paste in multiple URLs for any texts you want and can start right away analyzing things such as word frequency. Another project is the Simulated Environment for Theatre, which is a game engine that allows simulation of plays. A director could use this for blocking, to see how characters move, or a historian could simulate a play that wouldn’t be feasible to recreate live. It becomes a supple way of producing an experience and scholarship together. Bon Patron is a French grammar checker. [SpellCheckPlus is a version for English.] It’s very rewarding by virtue of the fact that it gets used a lot, with an average of 40,000 visitors a day. What’s surprised us about it is how widely used it is by native speakers–we’d built it with second-language learners in mind, but most users are actually francophones in France.

Most of the software I’m involved in developing is open source, but with universities wanting to find opportunities for commercialization and spin-offs, I’m conscious of those possibilities–such as offering a subscription version of Bon Patron for businesses. I think that leads us to value what we’re creating more and encourages a more entrepreneurial spirit in students as they see that knowledge in humanities can be worth something commercially.

The traditional output for an academic working in humanities is books and papers–has it been difficult to get professional recognition for the kind of work you’re doing?

“I am particularly passionate about tools and methodologies that allow for the proliferation of perspectives.”


I wrote a blog post recently about my proposal to add digital scholarship to the criteria for tenure and promotion, which was approved by my department. I know this is scholarship because it requires a particular blend of experience and expertise to create it. We can’t just tell a programmer to build it–part of understanding what it is comes from actually building it. But the onus is on us creators to show why the software we’ve developed has value. It’s more work, but as digital humanists it’s crucial. The type of software we’re developing is not going to be developed by other humanities. We have to build the tools ourselves–and we have to create a mechanism for that work to be rewarded and recognized.

Do you think that integrating technology into the traditional humanities makes a degree in literature more practically useful?

I think so, but I hasten to add that there is an inherent value in a more conventional-looking humanities degree. I think that good digital humanities is good when the humanities part is very strong, and I wouldn’t want to see programs diluted to where the humanities questions aren’t there anymore for the sake of technical skills. But I’ve been in the unusual situation of teaching digital humanities since my first academic assignment in 2001, and when I’m talking to prospective students and parents, and students who are graduating, it feels good to be able to say that they’re positioned advantageously because of that blend of technical skills and the soft skills that are important in humanities.

[Image: Flickr user UNPhoto]