Twitter looks like a simple thing, doesn’t it? A simple network of people who follow each other, short snippet messages of 140 characters or less, no tricky privacy controls, VIP friends, or any of the shenanigans of Facebook. When it started, it absolutely was this simple from an infrastructure point of view. But Twitter’s Raffi Krikorian, VP of Engineering, just gave a presentation that points out that keeping Twitter’s fail whale from appearing for hundreds of millions of users around the world was far from the simple task of scaling up the early system. The mammoth job of simply making sure a tweet from a popular user navigates the infrastructure and gets out to the community on time may make you think twice about complaining about your own database problems.
Krikorian revealed some figures that show the scale of Twitter’s problems: It has 150 million active users around the world, and the data going to and from these folk–that’s 400 million tweets a day–squeezes through a 22 MB/second firehose. If Lady Gaga, with 31 million followers, sends a tweet it can take up to about 5 minutes for those short 140 characters to reach her fans. Because Twitter is much more of a consumption platform than an input platform, the company has configured its entire infrastructure to support the bias: The code actually does a lot of processing the moment tweets arrive to figure out where they need to go–this means when tweets are “read” through an API call, the process is much quicker than if the processing happened at this point.
There are several other tricks Twitter uses, such as keeping track of active users and storing their data more accessibly than occasional user’s info, and storing a bunch of data in RAM for speedy lookups. By a bunch, I mean a lot: Every active user’s code is stored in RAM to lower latencies.
Propelling a lot of Twitter’s thinking about its infrastructure is that it’s now no longer a simple web app, or even a smartphone app: It’s a coherent set of APIs for delivering messages accurately on a vast scale and in near-real-time to a diverse userbase. It’s this API set that effectively is Twitter’s core asset, and tied to advertising it’s the key to more revenue in the future.
[Image: By Flickr user Les Chatfield]