Blind spots in the Microsoft CDN study
This post is the second in a series examining the Microsoft CDN study comparing Akamai and Limelight. The first post discusses measurement: what the study does and doesn’t look at. Now, I want to build on that foundation to explain what the study misses.
In the meantime, Akamai has responded publicly. One of the points raised in their letter is the subject of my post — why the study does not provide a complete picture of the Akamai network, and why this matters.
The paper says that the researchers used two data sets — end-user IP addresses, as well as webservers — in order to derive the list of DNS servers to use as vantage points. Webservers are generally at the core or the middle mile, so it’s the end-user IPs we’re really interested in, since they’re the ones which indicate the degree to which broader, deeper reach matters. The study says that reverse DNS lookup was used to obtain the authoritative nameserver for an IP, and the ones which responded to open-recursive queries were used.
The King methodology dates back to 2002. Since that time, open-recursive DNS servers have become less common because they’re potentially a weapon in DDoS attacks, and open-recursive authoritatives even more so because of the potential for cache poisoning attacks. So immediately, we know that the study’s data set is going to miss lots of vantage points owned by the security-conscious. Lack of a vantage point means that the study may be “blind” to users local to it, and indeed, it may miss some networks entirely.
Let’s take an example. I live in the Washington DC area; I’m on MegaPath DSL. A friend of mine, who lives a bit less than 20 miles away, is on Verizon FIOS.
Verizon FIOS customers have IP addresses that reverse-lookup to something of a scheme format that ends in verizon.net. The nameservers that are authoritative for verizon.net are not open-recursive. Moreover, the nameservers that Verizon automatically directs customers to, which are regional (for instance, DC-suburb customers are given nameservers in the ‘burbs of Reston, Virginia, plus one in Philadelphia), are not open-recursive. So that tells us right off the bat that Verizon broadband customers are simply not measured by this study.
Let me say that again. This study almost certainly ignores one of the largest providers of broadband connectivity in the United States. They certainly can’t have used Verizon’s authoritative nameservers as a vantage point, and even if they had somehow added the Verizon resolvers manually to their list of servers to try, they couldn’t have tested from them, since they’re not open-recursive.
Of course, the study doesn’t truly ignore those users per se — those users are probably close, in a broad network sense, to some vantage point that was used in the study. But note that it almost certain to be cross-AS at that point, i.e., on somebody else’s network, which means that the traffic had to cross a peering point, which is itself a bottleneck. So right off the bat, you’re not getting an accurate measure of their experience.
The original King paper (which describes the sort of DNS-based measurement used in the Microsoft study) asserts that the methodology is still reasonable for estimating end-user latency, because, from their sample data, the distance from end-user clients to the name servers has a median of 4 hops, with about 20% longer than 8 hops; as high as 65-70% of these account for 10 ms or less of latency. But that’s a significant number of hops and a depressingly low percentage of negligible-latency distances, which absolutely matters when the core of your research question is whether being at the edge makes a performance difference.
The problem can be summed up like this: Many customers are closer to an Akamai server than they are to their nameserver.
My friend and I, living less than 20 miles apart, get totally divergent results for our lookups of Akamai hosts. We’re likely served off completely different clusters. In fact, my ISP’s closest nameserver is 18 ms from me — and my closest Akamai server is 12 ms away.
It’s a near certainty that the study has complete blind spots — places where there’s no visibility from a proximate open-recursive nameserver, but a local Akamai server. Akamai has tremendous presence in ISP POPs, and there’s a high likelihood that a substantial percentage of their caches serve primarily customers of a given ISP — that’s why ISPs agree to host those servers for free and give away the bandwidth in those locations.
More critique and some conclusions to come.