How AI And Crowdsourcing Are Remaking The Legal Profession

Tech advances allow new companies to harness networks of computers and people to sift through legal information and help lawyers prep cases.

How AI And Crowdsourcing Are Remaking The Legal Profession
I, Robot, 2004 [Photo: 20th Century Fox]

“The legal industry is ripe for innovation,” says attorney and journalist Robert Ambrogi, who covers the role of technology in law. In an influential April 13 blog post, Ambrogi proclaimed a boom in legal tech startups based on a more than doubling of listings on startup directory AngelList. Ambrogi has since produced his own streamlined listing that currently has nearly 500 companies offering technologies to the legal industry. Several are courting attorneys who need better, cheaper ways to sort through the avalanche of legal filings, rulings, and spiderwebs of citations between cases, from the local to federal level.


The innovation upsurge may in part be generational. “If you make partner today in a law firm, and if you were in college with Google, you have different expectations of technology,” says Josh Becker, CEO of Lex Machina. The company tracks the activities of lawyers and judges using an artificial intelligence technique called natural language processing (NLP) to analyze court documents and figure out things like how a particular judge tends to rule on particular types of cases. It can also ferret out types of cases, such as patent or trademark, the specific IP a claim asserts, and all the attorneys involved.

The startup boom also comes from a new generation of technology. “We use Amazon Web Services for our computing power,” says Daniel Lewis, CEO and cofounder of Ravel Law. Started in 2012, Ravel ingests the corpus of current and historical U.S. legal data, using NLP and machine learning to map how cases interrelate, and how judges tend to rule. “We can spin up hundreds of computers in complex data-science work at relatively low cost,” he says. “Just 10 years ago, [that process] would have been dramatically more expensive and slower.”

A Nascent Market

Lex Machina is not a typical scrappy startup. It grew out of a project at Stanford University Law School, funded by major tech companies, to address the explosion of intellectual property lawsuits in recent years (what some call patent trolling). The company spun off in 2010, and in November last year it was purchased by legal analytics behemoth LexisNexis for an undisclosed sum. In June, Lex Machina will expand beyond intellectual property to include securities and antitrust law.

Big money is not a requirement. The legal profession is joining other industries, says Ambrogi, where entrepreneurs with a good idea can take on giant players, such as research and publishing firms Bloomberg Law, LexisNexis, and Thomson Reuters Westlaw. “Now it’s law students, or younger lawyers who just aren’t happy with the way things are getting done and will get an idea and try to develop it,” says Ambrogi. PacerPro, for instance, is bootstrapped mainly by the family of its founder and CEO, Gavin McGrane. The company puts a user-friendly interface on the government’s clunky PACER database of federal court cases.

Ravel grew out of Stanford Law School. Nik Reed, Ravel’s other cofounder, credits the school for encouraging students to take classes in other divisions of the university, such as business and computer science. “It certainly helps to be at a school like Stanford if you’re trying to work on multidisciplinary themes and connect engineers, designers, and lawyers,” he says.


In 2013, Stanford alum Jake Heller founded Casetext—a repository of state and federal laws and legal cases, including crowdsourced analyses of them. Casetext challenges expensive services like Thomson Reuters Westlaw, which employ armies of writers and editors. “I immediately thought to myself … we’ve all seen this story,” says Heller. “Whether it’s restaurants or encyclopedias, this is going to be replaced by an open knowledge solution.”

Casetext crowdsources commentaries to annotate important laws and court decisions.

Heller recruited people like law students and lawyers eager to contribute. “The vast majority of major law firms publish articles, and many publish thousands (and some even tens of thousands),” he says. Heller offers writers the same proposition that self-publishing platform Medium does: a venue to promote ideas and drum up business. That’s attracted about 660,000 contributions, and almost 500,000 monthly users. “Much of this is incredibly valuable content that attorneys publish to demonstrate expertise and mastery of legal developments to peers and clients,” says Heller. Casetext readers use a Reddit-like upvoting mechanism to promote the best articles.

Big Legal Data

Many new companies couldn’t have competed against (or gotten acquired by) firms like LexisNexis and Thomson Reuters without accessible legal data. In 1999, the federal courts created PACER (Public Access to Court Electronic Records), a web portal that lets anyone search and retrieve all documents related to cases in federal bankruptcy, district, and appellate courts. The system also allows attorneys to file all their paperwork online. “The federal courts tend to have what’s described sometimes as the Cadillac of law,” says McGrane. “All the cases you tend to hear about, like Apple vs. Samsung, Bernie Madoff—they’re all in federal court.”

PacerPro provides a more user-friendly and powerful way to access PACER. (McGrane says that PacerPro has 14,000 paying customers, including in a quarter of American Lawyer magazine’s AM 100 listing of biggest law firms.) PacerPro is not only prettier; it provides better tools, such as more granular search capabilities and automation of the regular PACER service’s process of manually downloading and forwarding documents.

PACER is funded by user fees on both searches and downloads by anyone who isn’t involved in a case but wants to research it. Fees start at 10¢ per page of search results or document download. Some fees are capped at $3 per document; some have no caps. Anything that gets attorneys to information quicker could save money.

Ravel’s visualization tool shows how dozens of cases are connected through citations.

“If you were to download everything in the PACER system, it would cost you hundreds of millions, if not more,” says Lewis of Ravel Law, which has found an alternative by partnering with Harvard Law School to digitize its archive. “They’ve made an effort to collect every … court decision from every state and federal court over the last 200 years,” Lewis says. Ravel collects new information in real time. “The courts themselves are doing a much better job of pushing out today’s law,” he adds. Ravel has published the complete case law for California and New York. It aims to offer all U.S. federal and state law online by mid 2017, for free.

Collecting mounds of data is only a first step. “You can work with data science and different kinds of data markup and create algorithms that can extract really interesting information from large amounts of documents,” says Reed. This is what Ravel’s paid users get.

Its Judge Analytics tool, for instance, displays the cases a judge has authored and cited. Results can be filtered to specific sentences in rulings that have been most influential on the judge. A visualization tool shows Supreme, federal circuit, federal district, and state cases. Each appears as a bubble, larger or smaller depending on its significance. Clicking on one case bubble displays arrows that show all the other cases it cites or that cite it.

Lex Machina also provides judge analytics: for example, how often they have ruled for claimants or defendants, on what grounds. “You can … basically get a sense of, has this judge had many patents or trademark cases before,” says Howard, providing an example. “Am I going to have to do a lot of legal explaining along the way, or is this an experienced judge who knows what’s up and knows all the dirty tricks?”

Knowing how long a case is likely to take helps Lex Machina users set a budget

Lex Machina even shows how long cases take, per judge, which helps attorneys estimate a budget for clients. For instance, patent cases get through the court of Phyllis J. Hamilton, chief district judge in the Northern District of California, between 332 and 525 days. Gavin McGrane says that PacerPro will also harness insights from court cases, including on judges, but isn’t ready to provide details.


Though companies like Ravel and Lex Machina employ sophisticated AI, they don’t claim to provide a robolawyer. “What we’re hopefully doing is finding cases that you need to understand,” says Nik Reed of Ravel. “Professional lawyers have to use their intuition and their best judgment to understand the law.”

About the author

Sean Captain is a Bay Area technology, science, and policy journalist. Follow him on Twitter @seancaptain.