When two unknown graduate students, Larry Page and Sergey Brin, first described Google in an academic paper published in 1998, the key innovation that drove their improved search engine was the algorithm they called PageRank (after Larry Page). They used it to better rank the order of search results, or, as they put it, “bring order to the web.”
Though the extent to which Google relies on PageRank to deliver search results today isn’t really known to people outside the company, the algorithm is famous for its role in the making of the $55 billion business.
Much more quietly, PageRank itself has also become famous for its usefulness in a surprisingly wide range of fields that have nothing to do with the web–in everything from the analysis of sports teams to the interconnections inside the brain.
“It’s sort of like Google invented a lens,” says Purdue University computer scientist David Gleich. “If you have different combinations of lenses, you can look at all kinds of different systems–you can get microscopes, telescopes, or digital cameras. But you needed that unique insight of the lens.”
What PageRank does is analyze the connections, or hyperlinks, among web sites (also called the web graph) as an estimate of their importance, or authority, on a subject. It uses this information to order search results. When Page and Brin published their analysis, it was not just a novel application of existing mathematics. Academics today consider it a novel method in itself, though it took pieces from what others had done before.
In his research studying methods in data science, Gleich has been working to document the use of the PageRank algorithm in other fields in the 16 years since the original paper. He’s found dozens of cases that use the PageRank math. He’s not trying to track down every instance it’s been used, but rather to show PageRank’s wide applicability (Gleich finds his examples by reading and searching tech news sites, but then goes and dives into the math).
Pagerank, he says, could be used wherever there is a graph–which is a math idea that represents relationships or flows between a set of objects, whether those objects are web pages, office workers, Facebook users, genes, proteins, or roads. “If you start looking for graphs, you find them just about everywhere.”
Here are a couple of the more interesting use cases. None have contributed to as much commercial or societal impact as Google, but Gleich doesn’t rule out that possibility some day.
PageRank methods help answer the seemingly subjective question: What are the most important books? One University of Nebraska literature professor developed software that uses PageRank and other methods to answer this question for authors of the 19th century. Analyzing nearly 3,600 novels, it concluded that Jane Austen and Walter Scott were in fact the most influential, original authors of the 1800s.
Who’s the best team or player in a given sport in history? Fans will debate this endlessly, and the criteria are subjective. One paper uses PageRank to look at all pro tennis matches since 1968 and analyzes all the matches between the same two opponents, constructing a network and developing from that a “prestige score.” In the realm of tennis, its answer is that Jimmy Connors is the best player. As Gleich says, the underlying idea behind these kinds of rankings is that a random fan follows a team or player until another beats them, at which point they pick up the new team–akin to a web surfer following links around the web.
“The human brain connectome [the neural network that links parts of the brain] is one of the most important networks, about which we understand surprisingly little,” writes Gleich in his paper. So it’s not surprising that PageRank has been applicable here. Most recently, it was used to evaluate the importance and connections among different brain regions, and how that changes with age, as seen under MRI scans of 1,000 people.
In a paper entitled “Google Goes Cancer,” researchers developed a “novel computational approach” based on Google’s algorithm that helped them to identify seven tumor genes that are linked to the prognosis of patients with pancreatic cancer and would help doctors better direct the course of cancer treatments.
In another surprising use of the algorithm, PageRank is used to predict both traffic flow and human movement within urban spaces. One study relies on a key ingredient of the algorithm, a concept called teleporting, which mimics the idea of a person starting or ending a journey, or parking, on a given street. This allows transportation researchers to create models that better gauge the inflow and outflow of cars on a road network, given that some cars that park may enter a road network without leaving.