Google’s New Caffeine Search Indexing Tech: What It Is, and Why You Should Care

Google Caffeine

Google‘s Caffeine isn’t a new mobile app or fancy bit of desktop-to-cloud software, like we’re used to from Google. Instead it’s a massive change to the underlying tech which powers Google’s search. Caffeine will remain unseen, but you’ll use it all the time.

Currently, Google indexes the Web in layers. Some are scanned at shorter intervals than others, and the main layer is scanned every few weeks. That means there could be some delay in terms of when it’s published to the Web and when it’s available to you through Google search, as Google has to scan a massive amount of information before you can find any of it.

That’s becoming more of a problem, as to-the-second live updates are becoming more prevalent. Says Google:

So why did we build a new search indexing system? Content on the web is blossoming. It’s growing not just in size and numbers but with the advent of video, images, news and real-time updates, the average webpage is richer and more complex. In addition, people’s expectations for search are higher than they used to be. Searchers want to find the latest relevant content and publishers expect to be found the instant they publish.

So instead of big layers, Caffeine analyzes little chunks and updates constantly. Google estimates the new system will result in “50% fresher results” than before. It takes a ridiculous amount of data–Google estimates 100 million gigabytes right now, with hundreds of thousands of gigabytes added per day–but it should make Google better equipped to deal with the future, as well as pesky upstarts like Microsoft‘s Bing.

Dan Nosowitz, the author of this post, can be followed on Twitter, corresponded with via email, and stalked in San Francisco (no link for that one–you’ll have to do the legwork yourself).