Stanislav Shalunov, cofounder and CTO of connectivity startup Open Garden, was the chair of the Transport Working Group at the Internet2 consortium from 2000 to 2006, where he developed the Low Extra Delay Background (LEDBAT) protocol to solve the problem of traffic congestion on the consortium’s network. Today, the LEDBAT protocol is used by companies across the world, including BitTorrent and Apple, to send large files quickly across the web, accounting for 13%-20% of all Internet traffic. I spoke to Shalunov over Skype about how he came up with LEDBAT, and how it works.
Many scientists at the time were using Internet2 for transporting amounts of data that were a much larger than anything that TCP has been used for. A canonical example is the Large Hadron Collider, but there were many other projects. The reason that Internet2 existed was to enable these very high-speed connections.
At the time, there was no LEDBAT yet. One of the core problems with TCP, which was the standard way to send data, is that you need a very low loss rate if you’re going to send fast. For example, if you are sending across the continent or between the East Coast and Europe at 10 gigabits per second, you need, roughly speaking, one loss per 90 minutes. These are rough numbers, but this is an extraordinary requirement for a network to meet. This makes everything very expensive and very hard to engineer. If you have the slightest imperfection in the way fiber is attached anywhere in the path, you’ll have a much higher loss rate. In fact, at the time, I calculated that this was bumped up against physical limits, the bit error rate, within about one order of magnitude of the physical limit.
It’s hard to make something go that fast, that close to physical limits. The reason TCP has this property is that it relies exclusively on loss for congestion indication. That’s what needs to tell the host that there is congestion in the network so that the host can slow down. The only way that TCP will listen is if you drop a packet. If you drop one packet per 90 minutes you don’t tell the sender very much. For 90 minutes you are completely silent and you tell them to slow down, but even then you don’t know by how much.
It’s very hard to take advantage of this loss to build high-speed transport protocols. It was obvious to me that we needed to look at delay as well as loss. When you start taking delay into account, you get information with every packet that you receive. That’s much more information. You may have 100,000 packets per second. Suddenly, you have plenty of information about the exact state of the network at any given time.
You can then very rapidly adapt the moment that something happens on the network. You can respond just right. You don’t have to hunt for the right rate, slowly increasing over the course of an hour only to find yourself needing to halve your rate. This was important.
The other thing that we realized while researching this was that this solves a different problem with TCP as well. Normally, for TCP to experience losses it must first cause the buffer to overflow. Every Internet bottleneck has some amount of buffer space. Sometimes people measure it in kilobytes, bytes, and megabytes, but it’s not the right way of measuring it. The right way of measuring it is in units of time. That way it is scaled with respect to the speed of the link. These buffers, for the Internet to work with TCP, must be at least a few hundred milliseconds. But sometimes you find places where these buffers are much larger.
In fact, when we started looking at the majority of cases these buffers could be as large as a few seconds. For TCP to experience losses it first drives delays of up to a few seconds. That’s a terrifying degradation of user experience. From the perspective of most users at that point the network is inoperational.
If you asked a 5-year-old, he would be able to easily tell you that the Internet isn’t working, you can’t load web pages. But if you ask a congestion control researcher, he would say that TCP is operating as designed. So that needed to be addressed and the delay-based congestion control also allows us to address that.
Since you are responding to delay, you don’t need to wait until the buffer is full. You give yourself a budget of how much you intend to make the delays worse. You can decide that this budget is going to be 100 milliseconds, or 25 milliseconds, for example. If you use an even smaller number, like eight milliseconds, it will work fine for most environments. That’s not even noticeable for people playing on real-time games. If you use even 100 milliseconds, that’s not going to be a significant impact for somebody browsing the web.
If you use 25 milliseconds, for our Skype call, there would be no way for us to notice that something has changed if we added 25 milliseconds of latency to it. But if I ran a TCP connection right now, then from a practical perspective we wouldn’t be able to talk anymore. This video would go all squares.
At that point I started to realize that this was applicable way beyond high-speed transfer, that this was applicable in any situation where there was congestion on the network. That led us to look for areas of applicability of this and the most frequent congestions experienced by peer-to-peer applications.
Peer-to-peer applications experience congestion because they send up, and uplinks are typically very narrow compared to downlinks. Applications that exercise the uplinks end up experiencing the vast majority of congestion on the Internet. The rest of the Internet tends to be engineered in such a way that when you experience congestion, you just upgrade the link, and congestion for much of the rest of the Internet is taken as a sign that you need to upgrade something.
For example, on a backbone network, when five-minute average utilization reaches 80%, most operators will upgrade it and not give it the chance to get to where it is congested, precisely because TCP does not cope with congestion so well.
LEDBAT, however, allows you to use 100% of the connection’s capacity without significant degradation in latency. In practice, to upgrade a typical connection on the Internet at 80% you need to start thinking about upgrading at 50%. Then, after you upgrade it, it may be at 15%. A typical connection on the Internet may only be utilized perhaps 50%. This basically gives you a free factor of two for how much traffic you can carry on those same connections.
That’s the system view of the world. On application layer, the applications at the time experiencing queuing problems, people using BitTorrent, for example, which was the most common peer-to-peer application, would basically use tricks like running BitTorrent at night. From their perspective, the Internet wasn’t very usable when BitTorrent was running. When Bram [Cohen, creator of BitTorrent] told me about it, and described this as the number one problem for BitTorrent users at the time, I was like, “Well, wait a second, I can help.”
I ended up at BitTorrent after my little company called Plicto was acquired by it, and I led the effort to integrate this congestion control protocol into BitTorrent. In BitTorrent’s implementations it’s called uTP. By now, uTP carries probably about 95% or more of BitTorrent traffic. There are some clients that don’t support it, but all the main ones do. It’s an open source project by now and people can take the code and use it in their implementation, so there are multiple interoperating implementations. This accounts for 95-plus percent of BitTorrent traffic, which is approximately, perhaps 13% to 20%, depending on who you ask, of total Internet traffic. The good data on percent of BitTorrent traffic and Internet traffic is Sandvine reports.
Apple also uses it for software updates. For a long time, they wanted to do software updates so that the downloads wouldn’t interfere with your web browsing and other activities. These are enormous downloads and they come from very fast servers. They use Akamai to distribute them. So, they can easily drown out your more desired Internet activity. There was interest at very senior levels of Apple to address this problem, but for a long time the engineers at Apple didn’t have an appropriate solution.
When I made LEDBAT public at the Internet Engineering Taskforce, Apple looked at it and was like, “Oh, great. That’s what we’re going to do for software updates.”
[Image: Flickr user Hartwig HKD]