Google's Books project, a six-year-old task to digitize as many of the world's books as possible, is an incredibly difficult and ambitious undertaking. Working alongside disparate library systems, governments, private collections, universities, and more has proven a challenge, one that's encountered all kinds of resistance. That's not even taking into consideration some of the most basic obstacles Google's had to deal with.
One of those basic obstacles is at the core of the entire project. If you're going to digitize the world's books, it makes sense to figure out how many books there actually are—but as it turns out, that's a whole can of trouble in itself. Google put up a blog post today that illustrates just how tricky it has been to find out the answer to such a simple question, and it's oddly fascinating.
First of all, what defines a "book"? Separate editions count as separate books, right? How different do two editions have to be to count as two books? What about compilations of existing works? You can't even rely on the established systems, like ISBN: ISBN is a fairly recent system, only adopted widely in the 1970s, and it's distinctly more accurate when documenting books from the Western world. Each individual standard classification has its own troubles, and seemingly absolute data like author and title are actually very nebulous and thus not particularly useful.
The way Google's chosen to deal with that problem is by collecting metadata and weeding out as many duplicates as possible. Each title is called a "cluster," incorporating all kinds of different editions and forms of a single "work" or "tome." That leads to a count of 210 million clusters, but those clusters in turn have to be purged of non-book materials cataloged by, among others, the Library of Congress. Those non-books include microform, film, maps, and, weirdly enough, about a thousand T-shirts. Then Google excludes serials, many of which are often given to one work, muddling the count.
So how many books are there? Says Google:
"After we exclude serials, we can finally count all the books in the world. There are 129,864,880 of them. At least until Sunday."
Now, just sit back and wait for that question to be asked on Jeopardy!