When the team at Digg learned about the impending demise of Google Reader, they knew they had to act fast–and build something killer. The resulting void would leave millions of potential users on the hunt for something new, and the recently Betaworks-acquired Digg was unusually well-positioned to build a product dedicated to reading things. But with a limited time frame in which to create Digg Reader, the team had to limit the product’s scope and choose its tech tools wisely.
Leading the charge on the technical side at Digg is CTO Mike Young, who gave us a glimpse behind the scenes at the tech stack and development frameworks used to replace a beloved, eight-year-old service in a matter of months.
In 2013, rolling one’s own infrastructure might be a laudatory technical feat, but for most, not worth it when super-reliable cloud hosting is so readily available. Not surprisingly, Digg is leaning on Amazon Web Services (AWS) for its hosting and a few additional infrastructural needs.
“We currently have a mix of infrastructure/services that we built ourselves over the last year for Digg.com as well as some of the AWS-hosted services for storing data (DynamoDB) and queueing (SQS),” says Young. “Since we had such a short time frame to build Digg Reader we had to lean heavily on some of the hosted AWS services, like DynamoDB, versus rolling our own.”
- Amazon Web Services for hosting and content delivery. Young says they’re fully hosted on AWS, running most of Digg Reader off of instances of Amazon’s Elastic Compute Cloud (EC2).
- DynamoDB. Young’s team uses Amazon’s NoSQL database solution rather than building their own “since we had such a short time frame to build Digg Reader,” he says.
- Amazon’s Simple Queue Service (SQS) is used for queuing messages between components of Digg’s Amazon-powered backend. For those unfamiliar with message queues, Wikipedia offers a pretty thorough primer.
- Amazon Route 53 is used for DNS management. The service doesn’t offer domain names, but rather lets users control the DNS settings then map the domains and subdomains to specific IP addresses and other standard DNS settings.
- The AWS Elastic Load Balancer is necessary for sites expecting as many sudden visitors as Digg Reader was. It smartly distributes all that inbound traffic among Digg’s EC2 instances for maximum fault tolerance and stability.
“I don’t think we would have been able to pull this off in the time frame we had without AWS and the ability to scale up a large number of machines in short period of time,” says Young.
“One of the things that was so amazing about Google Reader was how fast it was, both in serving up content when you loaded the page or paginated through feeds, but also on the feed aggregation/crawling side,” Young says. “We spent some time talking with the original Google Reader team when we first started the project, and they were kind enough to give us some great insight into the product and infrastructure.”
In addition to the Amazon’s hosting and services, Young’s team cobbled together a number of other backend engineering tools and techniques to get Digg Reader to run as quickly and responsively as possible.
“The backend is written in Python and is pretty standard in terms of the stack that we use,” he says.
- Python is the backend programming language of choice at Digg. The general purpose scripting language is used by everyone from Dropbox to NASA and is generally quite popular, so this is no surprise.
- Memcached, redis, and twemproxy are used for caching. Redis and memcached are popular open source key-value stores designed to speed up web apps by lightening the load on databases. Twemproxy is a light-weight proxy for memcached and redis created by developers at Twitter.
- MongoDB. In addition to Amazon’s DynamoDB, Digg uses the uber-popular NoSQL database system MongoDB.
- Tornado is the Python Web framework and networking library of choice at Digg Reader. Originally developed by FriendFreed, Tornado “can scale to tens of thousands of open connections, making it ideal for long polling, WebSockets, and other applications that require a long-lived connection to each user.”
- Beanstalkd. To keep processes moving in the background, Digg uses the work queue Beanstalkd to run tasks asynchronously and keep the user experience as speedy and interruption-free as possible.
“For code deployment and server monitoring, we are using a mix of tools right now: fabric, statsd + graphite, Amazon’s Cloudwatch, sentry, munin, nagios, and pager duty,” says Young. “And Chartbeat, of course.”
“Gilad Lotan, the Betaworks Data Scientist, has built a system that allows us to score any URL based on a number of signals like tweets, Facebook shares, Diggs, etc.,” Young explains. “This is currently shown in the list of ‘Popular’ stories in Digg Reader. He uses a mix of redis, memcache, zeromq, and hypertable.”
“I’m really proud of what [design director] Justin Van Slembrouck and the dev team pulled off in such a short amount of time,” says Young. “There is a lot left to do, but it’s really exciting. We are really just getting started with this.”
One of the team’s biggest challenges was ensuring the Reader experience was as fast and responsive as possible. This isn’t easy when you’re aggregating and storing more than 8 million feeds and doing all kinds of heavy-lifting on the backend to make everything feel effortless. Thankfully, for the front-end guys, there’s no shortage of frameworks and tools to help patch together something that meets users’ not-always-patient demands.
“Jon Ferrer and Kevin Barnett have done a great job in making the site feel very responsive,” says Young. “One of our big goals for launch was to make the “All” feed feel very fast for users. We wanted users to be able to scroll through page after page of their “All” feed and have it feel very smooth. Jon is using some tricks in terms of loading and then showing the data to make sure the scolling and pagination of the All feed feels smooth.”
LESS is a compiler that helps developers extend what’s possible with Cascading Style Sheets (CSS), bringing a programmatic flavor to the code that powers how websites look and feel.
“Other than that, it’s pretty much just GitHub, Asana, Hipchat, and code editors,” says Young. “The old guys (like me) use Vi while the younger guys use fancy new tools like CodeKit and Sublime Text that I think pretty much write the code for you! Kids!”
[Image: Flickr user Johann Snyman]