Category Archives: Infrastructure

CDN overlays (and more on MediaMelon)

I was recently briefed by MediaMelon, a just-launched CDN offering a “video overlay network”. The implications of their technology are worth considering, even though I think the company itself is going to have a difficult road to travel. (MediaMelon has two customers thus far, and is angel-funded; it is entering an extremely tough, competitive market. I wish them luck, since their model essentially forces them to compete in the ever-more-cutthroat CDN price war, as their entire value proposition is tied up in lowering delivery costs.)

In brief, when a content provider publishes its video to MediaMelon, MediaMelon divides the video into small chunks, each of which is a separate file that can be delivered via HTTP, and relies upon the video player to re-assemble those chunks. This chunk-based delivery is conceptually identical to Move Networks streamlets. MediaMelon then publishes the content out to its CDN partners (currently Velocix plus an unannounced second partner). MediaMelon’s special sauce is that these chunks are then delivered via multiple sources. This is normally MediaMelon’s P2P network, with a fallback to MediaMelon’s CDN partners. Since the video is in chunks, the source can switch from chunk to chunk. The video player also reports its performance to MediaMelon’s servers, allowing MediaMelon to draw conclusions about how to serve content. As a delivery-focused company, MediaMelon has decided to leave the value-adds to its media platform partners, currently thePlatform.

Whatever the challenges of their business model, though, the overlay model is interesting, and from a broader market perspective, MediaMelon’s technology highlights several things about video player capabilities that should be kept in mind:

  • You can carve up your video and let the player re-assemble it.
  • You can deliver using multiple sources, including P2P.
  • The player knows what kind of performance it’s getting, and can report it.

These three key things make it extremely clear that it is technically feasible to create a “neutral” CDN overlay network, without requiring the cooperation of the CDNs themselves. MediaMelon is halfway there. It just hasn’t put together all the pieces (the technical hurdles are actually nontrivial), and it is designed to work with partner CDNs rather than force them into competition.

Basically, what a (non-AnyCast) CDN like Akamai or Limelight does, is that they’ve got a central engine gathering network performance data, which it uses to choose an individual CDN server, based on what it believes best for you (where “you” is defined by where your nameserver is). That individual CDN server then delivers the content to you.

What an overlay would have is a central engine that gathers performance data directly from the video player, and has a list of sources for a given piece of content (where that list includes multiple CDNs and maybe a P2P network). Based on historical and currently-reported performance data, it would direct the player to the source that delivers acceptable performance for the least cost. Dividing the content into chunks makes this easier, but isn’t strictly necessary. What you’d effectively have is a CDN-of-CDNs, with the overlay needing to own no infrastructure other than the routing processor.

That is the next-generation CDN. If it were vendor-neutral, allowing the customer to choose whomever it wanted to work with, it would usher in an era of truly brutal price competition.

Bookmark and Share

IDC’s take on cloud

IDC has recently published snippets of their cloud computing outlook on their blog; the data from the user survery is particularly interesting.

Bookmark and Share

Software and thick vs. thin-slice computing

I’ve been thinking about the way that the economics of cloud computing infrastructure will impact the way people write applications.

Most of the cloud infrastructure providers out there offer virtual servers as a slice of some larger, physical server; Amazon EC2, GoGrid, Joyent, Terremark Enterprise Cloud, etc. all follow this model. This is in contrast to the abstracted cloud platform provided by Google App Engine or Mosso, which provide arbitrary, unsliced amounts of compute.

The virtual server providers typically provide thin slices — often single cores with 1 to 2 GB of RAM. EC2’s largest available slices are 4 virtual cores plus 15 GB, or 8 virtual cores plus 7 GB, for about $720/month. Joyent’s largest slice is 8 cores with 32 GB, for about $3300/month (including some data transfer). But on the scale of today’s servers, these aren’t very thick slices of compute, and the prices don’t scale linearly — thin slices are much cheaper than thick slices for the same total aggregate amount of compute.

The abstracted platforms are oriented around thin-slice compute, as well, at least from the perspective of desired application behavior. You can see this in the limitations imposed by Google App Engine; they don’t want you to work with large blobs of data nor do they want you consuming significant chunks of compute.

Now, in that context, contemplate this Intel article: “Kx – Software Which Uses Every Available Core“. In brief, Kx is a real-time database company; they process extremely large datasets, in-memory, parallelized across multiple cores. Their primary customers are financial services companies, who use it to do quantitative analysis on market data. It’s the kind of software whose efficiency increases with the thickness of the available slice of compute.

In the article, Intel laments the lack of software that truly takes advantage of multi-core architectures. But cloud economics are going to push people away from thick-sliced compute — away from apps that are most efficient when given more cores and more RAM. Cloud economics push people towards thin slices, and therefore applications whose performance does not suffer notably as the app gets shuffled from core to core (which hurts cache performance), or when limited to a low number of cores. So chances are that Intel is not going to get its wish.

Bookmark and Share

The Microsoft CDN study

The Microsoft/NYU CDN study by Cheng Huang, Angela Wang, et.al., seems to no longer be available. Perhaps it’s simply been temporarily withdrawn pending its presentation at the upcoming Internet Measurement Conference. You can still find it in Google’s cache, HTMLified, by searching for the title “Measuring and Evaluating Large-Scale CDNs”, though.

To sum it up in brief for those who missed reading it while it was readily available: Researchers at Microsoft and the Polytechnic Institute of New York University explored the performance of the Akamai and Limelight CDNs. Using a set of IP addresses derived from end-user clients of the MSN video service, and web hosts in Windows Live search logs, the researchers derived a set of vantage points based on the open-recursive DNS servers authoritative for those domains. They used these vantage points to chart the servers/clusters of the two CDNs. Then, using the King methodology, which measures the latency between DNS servers, they measured the performance of the two CDNs from the perspective of the vantage points. They also measured the availability of the servers. Then, they drew some conclusions about the comparative performance of the CDNs and how to prioritize deployments of new locations.

Both Akamai and Limelight pointed to flaws in the study, and I’ve done a series of posts that critique the methodology and the conclusions.

For convenience, here are the links to my analysis:
What the Microsoft CDN study measures
Blind spots in the Microsoft CDN study
Availability and the Microsoft CDN study
Assessing CDN performance

Hopefully the full PDF of the study will return to public view soon. Despite its flaws, it’s still tremendously interesting and a worthwhile read.

Bookmark and Share

MediaMelon and CDN overlays

MediaMelon has launched, with what they call their “video overlay network“.

I haven’t been briefed by the company yet (although I’ve just sent a request for a briefing), but from the press release and the website, it looks like what they’ve got is a client that utilizes multiple CDNs (and other data sources) to pull and assemble segments of video prior to the user watching the content. The company’s website mentions neither board of directors nor management team, though the press release mentions the CEO, Kumar Subramian.

I’ll post more when I have some details about the company and their technology, but I’ll note that I think that software-based CDN overlay networks are going to be a rising trend. As the high-volume video providers increasingly commoditize their CDN purchases, the value-added services layer will move from CDN-provided and CDN-specific, to CDN-neutral software-only components.

Bookmark and Share

When will Google App Engine be ready?

We’ve now hit the six-month mark on Google App Engine. And it’s still in beta. Few of the significant shortcomings in making GAE production-ready for “real applications” have been addressed.

In an internal Gartner discussion this past summer, I wrote:

The restrictions of the GAE sandbox are such that people writing complex, commercial Web 2.0 applications are quickly going to run into things they need and can’t have. Google Apps is required to use your own domain. The ability to do network callouts is minimal, which means that integrating with anything that’s not on GAE is limited to potentially impossible (and their URL fetcher can’t even do basic HTTP authentication). Everything has to be spawned via an HTTP request and all such requests must be short-lived, so you cannot run any persistent or cron-started background processes; this is a real killer since you cannot do any background maintenance. Datastore write performance is slow; so are large queries. The intent is that nothing you do is computationally expensive, and this is strictly enforced. You can’t do anything that accesses the filesystem. There’s a low limit to the total number of files allowed, and the largest possible file size is a mere 1 MB (and these limits are independent of the storage limit; you will be able to buy more storage but it looks like you won’t be allowed to buy yourself out of limitations like these). And so on.

Presumably over time Google will lift at least some of these restrictions, but in the near term, it seems unlikely to me that Web 2.0 startups will make commitments to the platform. This is doubly true because Google is entirely in control of what the restrictions will be in the future, too. I would not want to be the CTO in the unpleasant position of having my business depend on the Web 2.0 app my company’s written to the GAE framework, discovering that Google had just changed its mind and decided to enforce tighter restrictions that now prevented my app from working / scaling.

GAE, at least in the near term, suits apps that are highly self-contained, and very modest in scope. This will suit some Web 2.0 start-ups, but not many, in my opinion. GAE has gone for simplicity rather than power, at present, which is great if you are building things in your free time but not so great if you are hoping to be the next MySpace, or even 37Signals (Basecamp).

Add to that the issues about the future of Python. Python 3.0 — the theoretical future of Python — is very different from the 2.x branch. 3.0 support may take a while. So might support for the transition version, 2.6. The controversy over 3.0 has bifurcated the Python community at a time when GAE is actually helping to drive Python adoption, and it leaves developers wondering whether they ought to be thinking about GAE on 2.5 or GAE on 3.0 — or if they can make any kind of commitment to GAE at all with so much uncertainty.

These issues and more have been extensively explored by the blogosphere. The High Scalability blog’s aggregation of the most interesting posts is worth a look from anyone interested in the technical issues that people have found.

Google has been more forthcoming about the quotas and how to deal with them. I’ve made the assumption that quota limitations will eventually be replaced by paid units. The more serious limitations are the ones that are not clearly documented, and have more recently come to light, like the offset limit and the fact that the 1 MB limit doesn’t just apply to files, it also applies to data structures.

As this beta progresses, it becomes less and less clear what Google intends to limit as an inherent part of the business goals (and perhaps technical limitations) of the platform, and what they’re simply constraining in order to prevent their currently-free infrastructure from being voraciously gobbled up.

At present, Google App Engine remains a toy. A cool toy, but not something you can run your business on. Amazon, on the other hand, proved from the very beginning that EC2 was not a toy. Google needs to start doing the same, because you can bet that when Microsoft releases their cloud, they will pay attention to making it business-ready from the start.

Bookmark and Share

The nameserver as CDN vantage point

I was just thinking about the nameserver as a vantage point in the Microsoft CDN study, and I remembered that for the CDNs themselves, the nameserver is normally their point of reference to the customer.

When a content provider uses a CDN, they typically use a DNS CNAME to alias a hostname to a hostname of the CDN provider. For instance, http://www.nbc.com maps to http://www.nbc.com.edgesuite.net; the edgesuite.net domain is owned by Akamai. That means that when a DNS resolver goes to try to figure out what the IP address of that hostname is, it’s going to query the CDN’s DNS servers for that answer. The CDN’s DNS server looks at the IP address of the querying nameserver, and tries to return a server that is good for that location.

Notably, the CDN’s DNS server does not know the user’s actual IP. That information is not present in the DNS query (RFC 1035 specifies the structure of queries).

Therefore, what nameserver you use, and its proximity to where you actually are on the network, will determine how good the CDN’s response actually is.

I did a little bit of testing, which has some interesting results. I’m using a combination of traceroute and IP geolocation to figure out where things are.

At home, I have my servers configured to use the UltraDNS “DNS Advantage” free resolvers. They return their own ad server rather than NXDOMAIN, which is an annoyance, but they are also very fast, and the speed difference makes a noticeable dent in the amount of time that my mail server spends in (SpamAssassin-based) anti-spam processing. But I can also use the nameservers provided to me by MegaPath; these are open-recursive.

UltraDNS appears to use anycast. The DNS server that it picks for me seems to be in New York. And http://www.nbc.com ends up mapping to an Akamai server that’s in New York City, 12 ms away.

MegaPath does not. Using the MegaPath DNS server, which is in the Washington DC area, somewhere near me, http://www.nbc.com ends up mapping to a server that’s directly off the MegaPath network, but which is 18 ms away. (IP geolocation says it’s in DC, but there’s a 13 ms hop between two points in the traceroute, which is either an awfully slow router or more likely, genuine distance.)

Now, let’s take my friend who lives about 20 miles from me and is on Verizon FIOS. Using Verizon’s DC-area nameserver, he gets the IP address of a server that seems to live off Comcast’s local network — and is a mere 6 ms from me.

For Limelight, I’m looking up http://www.dallascowboys.com. From UltraDNS in NYC, I’m getting a Limelight server that’s 14 ms away in NYC. Via MegaPath, I’m getting one in Atlanta, about 21 ms away. And asking my friend what IP address he gets off a Verizon lookup, I get a server here in Washington DC, 7 ms away.

Summing this up in a chart:

My DNS / CDN PingAkamaiLimelight
UltraDNS12 ms14 ms
MegaPath18 ms21 ms
Verizon6 ms7 ms

The fact that Verizon has local nameservers and the others don’t makes a big difference as to the quality of a CDN’s guess as to what server it ought to be using. Here’s a callout to service providers: Given the increasing amount of content, especially video, now served from CDNs, local DNS infrastructure is now really important to you. Not only will it affect your end-user performance, but it will also affect how much traffic you’re backhauling across your network or across your peers.

On the surface, this might make an argument for server selection via AnyCast, which is used by some lower-cost CDNs. Since you can’t rely upon a user’s nameserver actually being close to them, it’s possible that the crude BGP metric could return better results than you’d expect. AnyCast isn’t going to cut it if you’ve got lots of nodes, but for the many CDNs out there with a handful of nodes, it might not be that bad.

I went looking for other comparables. I was originally interested in Level 3, and dissected http://www.ageofconan.com (because there was a press release indicating an exclusive deal), but from that, discovered Funcom actually uses CacheFly for the website. funcom.cachefly.net returns the same IP no matter where you look it up from (I tried it locally, and from servers I have access to in Colorado and California). But traceroute clearly shows it’s going to different places, indicating an anycast implementation. Locally, I’ve got a CacheFly server a mere 6 ms away. From California, there’s also a local server, 13 ms away. Colorado, unfortunately, uses Chicago, a full 32 ms away. Unfortunately, this doesn’t tell us much, beyond the fact that CacheFly has limited footprint; we’d need to look at a CDN with enough footprint that uses AnyCast to see whether it actually return results better than the nameserver method does.

So here’s something for future researchers to explore: How well does resolver location correspond to user location? How much optimization is lost as a result? And how much better or worse would AnyCast be?

Bookmark and Share

Assessing CDN performance

This is the fourth and probably final post in a series examining the Microsoft CDN study. The three previous posts covered measurement, the blind spots, and availability. This post wraps up with some conclusions.

The bottom line: The Microsoft study is very interesting reading, but it doesn’t provide any useful information about CDN performance in the real world.

The study’s conclusions are flawed to begin with, but what’s of real relevance to purchasers of CDN services is that even if the study’s conclusions were valid, its narrow focus on one element — one-time small-packet latency to the DNS servers and content servers — doesn’t accurately reflect the components of real-world CDN performance.

Cache hit ratios have a tremendous impact upon real-world CDNs. Moreover, the fallback mechanism on a cache miss is also important — does a miss require going back to the origin, or is there a middle tier? This will determine how much performance is impacted by a miss. The nature of your content and the CDN’s architecture will determine what those cache hit ratios look like, especially for long-tail content.

Throughput determines how quickly you get a file, and how well a CDN can sustain a bitrate for video. Throughput is affected by many factors, and can be increased through TCP/IP optimizations. Consistency of throughput also determines what your overall experience is; start-stop behavior caused by jittery performance can readily result in user frustration.

More broadly, the problem is that any method of testing CDNs from anything other than the edge of the network, using real end-user points, is flawed. Keynote and Gomez provide the best approximations on a day to day basis, but they’re only statistical samples. Gomez’s “Actual Experience” service uses an end-user panel, but that introduces uncontrolled variables into the mix if you’re trying to compare CDNs, and it’s still only sampling.

The holy grail of CDN measurement, of course, is seeing performance in real-time — knowing exactly what users are getting at any given instant from any particular geography. But even if a real-time analytics platform existed, you’d still have to try a bunch of different CDNs to know how they’d perform for your particular situation.

Bottom line: If you want to really test a CDN’s performance, and see what it will do for your content and your users, you’ve got to run a trial.

Then, once you’ve done your trials, you’ve got to look at the performance and the cost numbers, and then ask yourself: What is the business value of performance to me? Does better performance drive real value for you? You need to measure more than just the raw performance — you need to look at time spent on your site, conversion rate, basket value, page views, ad views, or whatever it is that tells you how successful your site is. Then you can make an intelligent decision.

In the end, discussions of CDN architecture are academically interesting, and certainly of practical interest to engineers in the field, but if you’re buying CDN services, architecture is only relevant to you insofar as it results in the quality of the user experience. If you’re a buyer, don’t get dragged into the rathole that is debating the merits of one architecture versus another. Look at real-world performance, and think short-term; CDN contract lengths are getting shorter and shorter, and if you’re a high-volume buyer, what you care about is performance right now and maybe in the next year.

Bookmark and Share

Availability and the Microsoft CDN study

This post is the third in a series examining the Microsoft CDN study. My first post examined what was measured, and the second post looked at the blind spots created by the vantage-point discovery method they used. This time, I want to look at the availability and maintenance claims made by the study.

CDNs are inherently built for resilience. The whole point of a CDN is that individual servers can fail (or be taken offline for maintenance), with little impact on performance. Indeed, entire locations can fail, without affecting the availability of the whole.

If you’re a CDN, then the fewer nodes you have, the more impact the total failure of a node will have on your overall performance to end-users. However, the flip side of that is that megaPOP-architecture CDNs generally place their nodes in highly resilient facilities with extremely broad access to connectivity. The most likely scenario that takes out an entire such node is a power failure, which in such facilities generally requires a cascading chain of failure (but can go wrong at single critical points, as with the 365 Main outage of last year). By contrast, the closer you get to the edge, the higher the likelihood that you’re not in a particularly good facility and you’re getting connectivity from just one provider; failure is more probable but it also has less impact on performance.

Because the Microsoft study likely missed a significant number of Akamai server deployments, especially local deployments, it may underestimate Akamai’s single-server downtime, if you assume that such local servers are statistically more likely to be subject to failure.

I would expect, however, that most wider-scale CDN outages are related not to asset failure (facility or hardware), but to software errors. CDNs, especially large CDNs, are extraordinarily complex software systems. There are scaling challenges inherent in such systems, which is why CDNs often experience instability issues as part of their growing pains.

The problem with the Microsoft study of availability is that whether or not a particular server or set of servers responds to requests is not really germane to availability per se. What is useful to know is the variance in performance based upon that availability, and what percentage of the time the CDN selects a content server that is actually unavailable or which is returning poor performance. The variance plays into that edge-vs-megaPOP question, and the selection indicates the quality of the CDN’s software algorithms as well as real-world performance. The Microsoft study doesn’t help us there.

Similarly, whether or not a particular server is in service does not indicate what the actual maintenance cost of the CDN is. Part of the core skillset of a CDN company is the ability to maintain very large amounts of hardware without using a lot of people. They could very readily have automated processes pulling servers out of service, and executing software updates and the like with little to no human intervention.

Next up: Some conclusions.

Bookmark and Share

Blind spots in the Microsoft CDN study

This post is the second in a series examining the Microsoft CDN study comparing Akamai and Limelight. The first post discusses measurement: what the study does and doesn’t look at. Now, I want to build on that foundation to explain what the study misses.

In the meantime, Akamai has responded publicly. One of the points raised in their letter is the subject of my post — why the study does not provide a complete picture of the Akamai network, and why this matters.

The paper says that the researchers used two data sets — end-user IP addresses, as well as webservers — in order to derive the list of DNS servers to use as vantage points. Webservers are generally at the core or the middle mile, so it’s the end-user IPs we’re really interested in, since they’re the ones which indicate the degree to which broader, deeper reach matters. The study says that reverse DNS lookup was used to obtain the authoritative nameserver for an IP, and the ones which responded to open-recursive queries were used.

The King methodology dates back to 2002. Since that time, open-recursive DNS servers have become less common because they’re potentially a weapon in DDoS attacks, and open-recursive authoritatives even more so because of the potential for cache poisoning attacks. So immediately, we know that the study’s data set is going to miss lots of vantage points owned by the security-conscious. Lack of a vantage point means that the study may be “blind” to users local to it, and indeed, it may miss some networks entirely.

Let’s take an example. I live in the Washington DC area; I’m on MegaPath DSL. A friend of mine, who lives a bit less than 20 miles away, is on Verizon FIOS.

Verizon FIOS customers have IP addresses that reverse-lookup to something of a scheme format that ends in verizon.net. The nameservers that are authoritative for verizon.net are not open-recursive. Moreover, the nameservers that Verizon automatically directs customers to, which are regional (for instance, DC-suburb customers are given nameservers in the ‘burbs of Reston, Virginia, plus one in Philadelphia), are not open-recursive. So that tells us right off the bat that Verizon broadband customers are simply not measured by this study.

Let me say that again. This study almost certainly ignores one of the largest providers of broadband connectivity in the United States. They certainly can’t have used Verizon’s authoritative nameservers as a vantage point, and even if they had somehow added the Verizon resolvers manually to their list of servers to try, they couldn’t have tested from them, since they’re not open-recursive.

Of course, the study doesn’t truly ignore those users per se — those users are probably close, in a broad network sense, to some vantage point that was used in the study. But note that it almost certain to be cross-AS at that point, i.e., on somebody else’s network, which means that the traffic had to cross a peering point, which is itself a bottleneck. So right off the bat, you’re not getting an accurate measure of their experience.

The original King paper (which describes the sort of DNS-based measurement used in the Microsoft study) asserts that the methodology is still reasonable for estimating end-user latency, because, from their sample data, the distance from end-user clients to the name servers has a median of 4 hops, with about 20% longer than 8 hops; as high as 65-70% of these account for 10 ms or less of latency. But that’s a significant number of hops and a depressingly low percentage of negligible-latency distances, which absolutely matters when the core of your research question is whether being at the edge makes a performance difference.

The problem can be summed up like this: Many customers are closer to an Akamai server than they are to their nameserver.

My friend and I, living less than 20 miles apart, get totally divergent results for our lookups of Akamai hosts. We’re likely served off completely different clusters. In fact, my ISP’s closest nameserver is 18 ms from me — and my closest Akamai server is 12 ms away.

It’s a near certainty that the study has complete blind spots — places where there’s no visibility from a proximate open-recursive nameserver, but a local Akamai server. Akamai has tremendous presence in ISP POPs, and there’s a high likelihood that a substantial percentage of their caches serve primarily customers of a given ISP — that’s why ISPs agree to host those servers for free and give away the bandwidth in those locations.

More critique and some conclusions to come.

Bookmark and Share