03.25.13

Sequencing The Genome Of Legal Documents To Make Them Readable

Unless you’re a lawyer, you probably don’t understand a lot of the agreements and contracts you sign. But Docracy hopes to even the playing field by letting you easily see which parts of a document are standard and where you might be getting screwed.

A few years ago, researchers at Cornell worked out that it would take 76 working days (25 days in all) to read all the privacy policies we agree to every year. And, of course, nobody does read them–not even one. But perhaps, in the future, there may be easier ways of making sense of dense legalese, so we can work out what’s routine language, and what might land us in trouble one day.


Docracy, a New York startup, has developed what it calls a “document genome” that it hopes will help ordinary folk, as well as businesses, and lawyers, to understand agreements more easily. Essentially, it is a collection of prior legal texts that you can search in powerful ways. Say you want to find out if a lease you’re about to sign is regulation-standard, or full of unusual clauses. You can upload the text to Docracy, and the site will compare it with lots of others, highlighting what’s novel.

“You can see how boilerplate the text is, and also the subtle differences,” explains founder Matt Hall. For example, if you’re reviewing an S1 SEC filing for a publicly listed business, you can see which parts are commonplace, and which are company-specific. “That might be the place where you would spend most of your time,” Hall says.

Docracy already hosts a document library aimed at smaller businesses. Hall says the “genome” idea came from his partner John Watkinson, who has a Ph.D. in bioinformatics, the study of biological data. They started to think of legal clauses as genes, and wondered if they could apply gene sequence algorithms to work out patterns. “We started to find all sorts of common phrases and clauses. They were modified for specific situations, but they were still largely the same.”

When Hall and Watkinson started Docracy, they hadn’t realized “how much of an open source thing was already happening among lawyers”–that attorneys will frequently borrow from each other, if they like certain wording (clearly, the idea of plagiarism doesn’t apply).

The genome’s first application is, which allows inventors to search for patent applications. They submit an invention description. The site breaks up the text into chunks. Then, it finds similar sections, ranking them with color-codes by how closely they match.

Hall says it’s much easier than searching the official record at the patent office, or using keywords on Google. “It’s a new way of searching. But it’s also a new user experience.” After patents, Docracy will probably concentrate on mergers and acquisitions, and then–well, who knows.

The question is what impact the technology might have on the legal industry. If it takes away some of the mystery from document preparation, and allows people to avoid hiring lawyers, it may be significant. For now, Docracy is making a pitch to lawyers to share their contracts, arguing that sharing is the best form of advertising, and that smaller firms can level the playing field with big fish. It’ll be interesting to see if law firms continue to upload their documents.

