I am way behind on my news announcements, or I’d have posted on this earlier: Limelight has bought AcceloWeb.
Like their competitors Aptimize and Strangeloop Networks, AcceloWeb is a software-based solution. FEO is an emerging technology, and it is computationally expensive — far more so than the kind of network-based optimizations that you get in ADCs like F5’s, or WOCs like Riverbed’s. It is also complex, since FEO tries to rewrite the page without breaking any of its elements — especially hard to do with complex e-commerce sites, for instance, especially those that aren’t following architectural best practices (or even good practices).
CDN and FEO services are highly complementary, since caching the optimized page elements obviously makes sense. Level 3 and Strangeloop recently partnered, with Level 3 offering Strangeloop’s technology as a service called CDN Site Optimizer, although it’s a “side by side” implementation in Level 3’s CDN POPs, not yet integrated with the Level 3 CDN. (Obviously, the next step in that partnership would be integration.)
The integration of network optimization and FEO is the most significant innovation in the optimization market in recent years. For Limelight, this is an important purchase, since it gets them into the acceleration game with a product that Akamai doesn’t offer. (Akamai only has a referral deal with Strangeloop.)
Gartner clients: My research note on improving Web performance (combining on-premise acceleration, CDN / ADN, and FEO for complete solutions) will be out soon!
It’s been mentioned to me that my “what are you hearing about from clients” posts are particularly interesting, so I’ll try to do a regular update of this sort. I have some limits on how much detail I can blog and stay within Gartner’s policies for analysts, so I can’t get too specific; if you want to drill into detail, you’ll need to make a client inquiry.
It’s shaping up into an extremely busy fall season, with people — IT users and vendors like — sounding relatively optimistic about the future. If you attended Gartner’s High-Tech Forum (a free event we recently did for tech vendors in Silicon Valley), you saw that we showed a graph of inquiry trends, indicating that “cost” is a declining search term, and “cloud” has rapidly increased in popularity. We’re forecasting a slow recovery, but at least it’s a recovery.
This is budget and strategic planning time, so I’m spending a lot of time with people discussing their 2010 cloud deployment plans, as well as their two- and five-year cloud strategies. There’s some planning stuff going around data centers, hosting, and CDN services, too, but the longer-term the planning, the more likely it is that it’s going to involve cloud. (I posted on cloud inquiry trends previously.)
There’s certainly purchasing going on right now, though, and I’m talking to clients across the whole of the planning cycle (planning, shortlisting, RFP review, evaluating RFP responses, contract review, re-evaluating existing vendors, etc.). Because pretty much everything that I cover is a recurring service, I don’t see the end-of-year rush to finish spending 2009’s budget, but this is the time of year when people start to work on the contracts they want to go for as soon as 2010’s budget hits.
My colo inquiries this year have undergone an interesting shift towards local (and regional) data centers, rather than national players, reflecting a shift in colocation from being primarily an Internet-centric model, to being one where it’s simply another method by which businesses can get data center space. Based on the planning discussions I’m hearing, I expect this is going to be the prevailing trend going forward, as well.
People are still talking about hosting, and there are still plenty of managed hosting deals out there, but very rarely do I see a hosting deal now that doesn’t have a cloud discussion attached. If you’re a hoster and you can’t offer capacity on demand, most of my clients will now simply take you off the table. It’s an extra kick in the teeth if you’ve got an on-demand offering but it’s not yet integrated with your managed services and/or dedicated offerings; now you’re competing as if you were two providers instead of one.
The CDN wars continue unabated, and competitive bidding is increasingly the norm, even in small deals. Limelight Networks fired a salvo into the fray yesterday, with an update to their delivery platform that they’ve termed “XD”. The bottom line on that is improved performance at a baseline for all Limelight customers, plus a higher-performance tier and enhanced control and reporting for customers who are willing to pay for it. I’ll form an opinion on its impact once I see some real-world performance data.
All in all, things are really crazy busy. So busy, in fact, that I ended up letting a whole month go by without a blog post. I’ll try to get back into the habit of more frequent updates. There’s certainly no lack of interesting stuff to write about.
The judge cited Muniauction v. Thomson Corp. as the precedent for a judgement of law, which basically says that if you have a method claim in a patent that involves steps performed by multiple parties, you cannot claim direct infringement unless one party exercises control over the entire process.
I have not read the court filing yet, but based on the citation of precedent, it’s a good guess that because the CDN patent methods generally involve steps beyond the provider’s control, it falls under this citation. Unexpected, at least to me, and for those IP law watchers among you, rather fascinating, since in our increasingly federated, distributed, outsourced IT world, this would seem to raise a host of intellectual property issues for multi-party transactions, which are in some ways inherent to web services.
It is possible that I am going to turn out to be mildly wrong about something. I predicted that neither Amazon’s CloudFront CDN nor the comparable Rackspace/Limelight offering (Mosso Cloud Files) would really impact the mainstream CDN market. I am no longer as certain that’s going to be the case, as it appears that behavioral economics play into these decisions more than one might expect. The impact is subtle, but I think it’s there.
I’m not talking about the giant video deals, mind you; those guys already get prices well below that of the cloud CDNs. I’m talking about the classic bread-and-butter of the CDN market, the e-commerce and enterprise customers, significant B2B and B2C brands that have traditionally been Akamai loyalists, or been scattered with smaller players like Mirror Image.
Simply put, the cloud CDNs put indirect pressure on mainstream CDN prices, and will absorb some new mainstream (enterprise but low-volume) clients, for a simple reason: Their pricing is transparent. $0.22/GB for Rackspace/Limelight. $0.20/GB for SoftLayer/Internap. $0.17/GB for Amazon CloudFront. And so on.
Transparent pricing forces people to rationalize what they’re buying. If I can buy Limelight service on zero commit for $0.22/GB, there’s a fair chance that I’m going to start wondering just what exactly Akamai is giving me that’s worth paying $2.50/GB for on a multi-TB commit. Now, the answer to that might be, “DSA Secure that speeds up my global e-commerce transactions and is invaluable to my business”, but that answer might also be, “The same old basic static caching I’ve been doing forever and have been blindly signing renewals for.” It is going to get me to wonder things like, “What are the actual competitive costs of the services I am using?” and, “What is the business value of what I’m buying?” It might not alter what people buy, but it will certainly alter their perception of value.
Since grim October, businesses have really cared about what things cost and what benefit they’re getting out of them. Transparent pricing really amps up the scrutiny, as I’m discovering as I talk to clients about CDN services. And remember that people can be predictably irrational.
While I’m on the topic of cloud CDNs: There have been two recent sets of public performance measurements for Rackspace (Mosso) Cloud Files on Limelight. One is part of a review by Matthew Sacks, and the other is Rackspace’s own posting of Gomez metrics comparing Cloud Files with Amazon CloudFront. The Limelight performance is, unsurprisingly, overwhelmingly better.
What I haven’t seen yet is a direct performance comparison of regular Limelight and Rackspace+Limelight. The footprint appears to be the same, but differences in cache hit ratios (likely, given that stuff on Cloud Files will likely get fewer eyeballs) and the like will create performance differences on a practical level. I assume it creates no differences for testing purposes, though (i.e., the usual “put a 10k file on two CDNs”), unless Limelight prioritizes Cloud Files requests differently.
I’m now trialing three different basic CDNs, having each of them deliver exactly one image off the same front page of a website: Amazon’s CloudFront, Rackspace’s Cloud Files CDN (a Limelight Networks partnership), and SimpleCDN.
SimpleCDN has turned out to be the overwhelming winner in terms of initial set-up — you can simply point it at a site and it will mirror your content, just like you’ll find in more full-featured CDNs. The downside is that it has basically no additional controls — from what I can tell, you can’t even delete a mirror once you set it up, and there’s no way to explicitly mark content with a Cache-Control header or the like. It also wins on the sheer price of data transfer — just a hair under 6 cents per GB. The downside is that, at that price, it doesn’t come close come to touching Limelight’s footprint, or, for that matter, Amazon’s — there are no East Coast delivery locations, for instance. It’s also routed via AnyCast.
Cloud Files and SimpleCDN are both missing meaningful logging capabilities. Cloud Files can give me daily aggregate totals of disk space used, bandwidth in and out, and number of paid and free operations. (Just numbers. No graph.) SimpleCDN can give me aggregated traffic graphs (outbound bandwidth, hits per second, and hits per 15 minutes) for the past 2 days and the past 2 weeks, plus aggregate totals for the last 30 minutes, day, and week.
With Amazon, you get detailed usage reports, modulo the fact that they are indecipherably opaque. You can set up S3 to give you detailed server logs; I’m processing mine through S3stat, which is a service that will Webalizer your S3 logs for you. Amazon is promising such logs for CloudFront in the future. At the moment, I’m stuck with the enigmatic usage reports. Nothing I can find anywhere will tell me what the difference between a tier 1 and tier 2 request is, for instance. What I’m interested in finding out is what percentage of my requests end up falling back to the origin, but it looks like that is a mystery that will have to wait for CloudFront logging.
Lack of logging is not particularly a big deal if you are just trying to offload static content in the form of images and the like — presumably in that scenario you have decent analytics based off hits to the base page or a tracking counter or something like that. However, if you’re trying to track something like software downloads, it is certainly a much more significant problem. And importantly, logs let you verify exactly what is going on, which may be significant for troubleshooting as well as settling billing disputes.
It’s official: Rackspace’s Cloud Files can now be distributed via Limelight’s CDN.
The Cloud Files CDN works exactly like announced: drop your files into Cloud Files, push a button, have them served over the Limelight CDN. It’s 22 cents flat-rate, without origin fees. I’ve discussed this deal before, including in the context of Amazon’s CloudFront, and my previous commentary stands.
What will be interesting to see is the degree to which Limelight preserves infrequently-accessed content in its edge caches, vs. Amazon, as that will make a significant impact on performance, as well as cost-effectiveness vs. CloudFront. Limelight has a superior footprint, superior peering relationships, and much more sophisticated content routing. For people who are just looking to add a little bit of effortless CDN, their ability to keep infrequently-accessed content fresh on their edge caches will determine just how much of a performance advantage they have — or don’t have — versus CloudFront.
A Little Testing
In the spirit of empirical testing, I’ve created a Cloud Files account, too, and am now delivering the images of the front page of a tiny website from a combination of CloudFront and Cloud Files (which is under Rackspace’s Mosso brand).
It looks like my Cloud Files are being served from the same Limelight footprint as other Limelight customers, at least to the extent that I get the exact same edge resolutions for http://www.dallascowboys.com as I do for cdn.cloudfiles.mosso.com.
The Cloud Files deployment is reasonably elegant, and Mosso provides a GUI and thus is more directly user-friendly. You create a container (folder), drop your files into it, and click a button to make the files public, making them automatically and instantly available via Limelight. (The Amazon process is slower, although still within minutes.)
However, in a move that’s distinctly inferior to the CloudFront implementation, the base URL for the files is http://cdn.cloudfiles.mosso.com/some-hash-value/yourfile, vs. the more elegant CloudFront http://some-hash-value.cloudfront.net/yourfile. You can CNAME the latter to some nicely in-your-domain host (images.yourcompany.com, let’s say), and in fact, if you’ve used another CDN or have simply been smart and delivered static content off its own hostname, you’d already have that set up. The Mosso method means that you’ve got to go through all your content and munge your URLs to point to the CDN.
Aside from being annoying, it makes the elegant fallback, less elegant. Having the CNAME means that if some reason CloudFront goes belly-up, I can trivially repoint the CNAME to my own server (and I can easily automate this task, monitoring CloudFront deliverability and swapping the DNS if it dies). I can’t do that as easily with Mosso (although I can CNAME cdn.cloudfiles.mosso.com to something in my domain, and set up a directory structure that uses the hash, or use a URL rewriting rule, and still get the same effect, so it is merely more awkward, not impossible).
That might be in part a way for Limelight to differentiate this from its normal CDN offering, which has the modern and elegant deployment mechanism of: simply point your website hostname at Limelight and Limelight will do the rest. In that scheme, Limelight automatically figures out what to cache, and does the right thing. That seems unlikely, though. I hope the current scheme is just temporary until they get a CloudFront-like DNS-based implementation done.
I’ll have more to say once I’ve accumulated some log data, and add SimpleCDN into the mix.
Amazon’s previously-announced CDN service is now live. It’s called CloudFront. It was announced on the Amazon Web Services blog, and discussed in a blog post by Amazon’s CTO. The RightScale blog has a deeper look, too.
How CloudFront Works
The tasks use Amazon API calls, but like every other useful bit of AWS functionality, there are tools out there that can make it point-and-click, so this can readily be CDN for Dummies. There’s no content “publication”, just making sure that the URLs for your static content point to Amazon. Intelligently-designed sites already have static content segregated onto its own hostname, so it’s a simple switch for them; for everyone else, it’s just some text editing, and it’s the same method as just about every other CDN for the better part of the last decade.
In short, you’re basically using S3 as the origin server. The actual delivery is through one of 14 edge locations — in the particular case of these locations, a limited footprint, but not a bad one in these days of the megaPOP CDN.
How Pricing Works
Pricing for the service is a little more complex to understand. If your content is being served from the edge, it’s priced similarly to S3 data transfers (same price at 10 TB or less, and 1 cent less for higher tiers). However, before your content can get served off the edge, it has to get there. So if the edge has to go back to the origin to get the content (i.e., the content is not in cache or has expired from cache), you’ll also incur a normal S3 charge.
You can think of this as being similar to the way that Akamai charged a few years ago — you’d pay a bandwidth fee for delivering content to users, but you’d also pay something called an “administrative bandwidth fee”, which was charged for edge servers fetching from the origin. That essentially penalized you when there was a cache miss. On a big commercial CDN like Akamai, customers reasonably felt this was sort of unfair, competitors like Mirror Image hit back by not charging those fees, and by and large the fee disappeared from the market.
It makes sense for Amazon to charge for having to go back to the (S3) origin, though, because the open-to-all-comers nature of CloudFront means that they would naturally have a ton of customers who have long-tail content and therefore very low cache hit ratios. On a practical basis, however, quite a few people who try out CloudFront will probably discover that the cache miss ratio for their content is too high; since a cache miss also confers no performance benefit, the higher delivery cost won’t make sense, and they’ll go back to just using S3 itself for static delivery. In other words, the pricing scheme will make customers self-regulate, over time.
Performance and CDN Server Misdirection
Some starting fodder for the performance-minded: My preliminary testing shows that Amazon’s accuracy in choosing an appropriate node can be extremely poor. I’ve noted previously that the nameserver is the location reference, and so the CDN’s content-routing task is to choose a content server close to that location (which will also hopefully be close to the end-user).
I live in suburban DC. I tested using cdn.wolfire.com (a publicly-cited CloudFront customer), and checked some other CloudFront customers as well to make sure it wasn’t a domain-specific aberration. My local friend on Verizon FIOS, using a local DC nameserver, gets Amazon’s node in Northern Virginia (suburban DC), certainly the closest node to us. Using the UltraDNS nameserver in the New York City area, I also get the NoVa node — but from the perspective of that nameserver, Amazon’s Newark node should actually be closer. Using the OpenDNS nameserver in NoVa, Amazon tries to send me to its CDN node in Palo Alto (near San Francisco) — not even close. My ISP, MegaPath, has a local nameserver in DC; using that, Amazon sends me to Los Angeles.
That’s a serious set of errors. By ping time, the correct, NoVa node is a mere 5 ms away from me. The LA node is 71 ms from me. Palo Alto is even worse, at 81 ms. Both of those times are considerably worse than pure cross-country latency across a decent backbone. My ping to the test site’s origin (www.wolfire.com) is only 30 ms, making the CDN latency more than twice as high.
However, I have a certain degree of faith: Over time, Amazon has clearly illustrated that they learn from their mistakes, they fix them, and they improve on their products. I assume they’ll figure out proper redirection in time.
What It Means for the Market
Competitively, it seems like Rackspace’s Cloud Files plus Limelight may turn out to be the stronger offering. The price of Rackspace/Limelight is slightly higher, but apparently there’s no origin retrieval charge, and Limelight has a broader footprint and therefore probably better global performance (although there are many things that go into CDN performance beyond footprint). And while anyone can use CloudFront for the teensiest bit of delivery, the practical reality is that they’re not going to — the pricing scheme will make that irrational.
Some people will undoubtedly excitedly hype CloudFront as a game-changer. It’s not. It’s certainly yet another step towards having ubiquitous edge delivery of all popular static content, and the prices are attractive at low to moderate volumes, but high-volume customers can and will get steeper discounts from established players with bigger footprints and a richer feature set. It’s a logical move on Amazon’s part, and a useful product that’s going to find a large audience, but it’s not going to majorly shake up the CDN industry, other than to accelerate the doom of small undifferentiated providers who were already well on their way to rolling into the fiery pit of market irrelevance and insolvency.
(I can’t write a deeper market analysis on my blog. If you’re a Gartner client and want to know more, make an inquiry and I’d be happy to discuss it.)
Rackspace announced yesterday, as part of a general unveiling of its cloud strategy, a new partnership with Limelight Networks.
Under the new partnership, customers of Rackspace’s Cloud Files (formerly CloudFS) service — essentially, a competitor to Amazon S3 — will be able to choose to publish and deliver their files via Limelight’s CDN. Essentially, this will place Rackspace/Limelight in direct competition with Amazon’s forthcoming S3 CDN.
CDN delivery won’t cost Cloud Files customers any more than Rackspace’s normal bandwidth costs for Cloud Files. Currently, that’s $0.22/GB for the first 5 TB, scaling down to $0.15/GB for volumes above 50 TB. Amazon S3, by comparison, is $0.17/GB for the first 10 TB, down to $0.10/GB for volumes over 150 TB; we don’t yet know what its CDN upcharge, if any, will be. As another reference point, Internap resold via SoftLayer is $0.20/GB, so we can probably take that as a reasonable benchmark for the base entry cost of CDN services sold without any commit.
It’s a reasonably safe bet that Limelight’s CDN is going to deliver better performance than Amazon’s S3 CDN, given its broader footprint and peering relationships, so the usual question of, “What’s the business value of performance?” will apply.
It’s a smart move on Rackspace’s part, and an easy way into a CDN upsell strategy for its regular base of hosting customers, too. And it’s a good way for Limelight to pre-emptively compete against the Amazon S3 CDN.
The Microsoft/NYU CDN study by Cheng Huang, Angela Wang, et.al., seems to no longer be available. Perhaps it’s simply been temporarily withdrawn pending its presentation at the upcoming Internet Measurement Conference. You can still find it in Google’s cache, HTMLified, by searching for the title “Measuring and Evaluating Large-Scale CDNs”, though.
To sum it up in brief for those who missed reading it while it was readily available: Researchers at Microsoft and the Polytechnic Institute of New York University explored the performance of the Akamai and Limelight CDNs. Using a set of IP addresses derived from end-user clients of the MSN video service, and web hosts in Windows Live search logs, the researchers derived a set of vantage points based on the open-recursive DNS servers authoritative for those domains. They used these vantage points to chart the servers/clusters of the two CDNs. Then, using the King methodology, which measures the latency between DNS servers, they measured the performance of the two CDNs from the perspective of the vantage points. They also measured the availability of the servers. Then, they drew some conclusions about the comparative performance of the CDNs and how to prioritize deployments of new locations.
Hopefully the full PDF of the study will return to public view soon. Despite its flaws, it’s still tremendously interesting and a worthwhile read.
This is the fourth and probably final post in a series examining the Microsoft CDN study. The three previous posts covered measurement, the blind spots, and availability. This post wraps up with some conclusions.
The bottom line: The Microsoft study is very interesting reading, but it doesn’t provide any useful information about CDN performance in the real world.
The study’s conclusions are flawed to begin with, but what’s of real relevance to purchasers of CDN services is that even if the study’s conclusions were valid, its narrow focus on one element — one-time small-packet latency to the DNS servers and content servers — doesn’t accurately reflect the components of real-world CDN performance.
Cache hit ratios have a tremendous impact upon real-world CDNs. Moreover, the fallback mechanism on a cache miss is also important — does a miss require going back to the origin, or is there a middle tier? This will determine how much performance is impacted by a miss. The nature of your content and the CDN’s architecture will determine what those cache hit ratios look like, especially for long-tail content.
Throughput determines how quickly you get a file, and how well a CDN can sustain a bitrate for video. Throughput is affected by many factors, and can be increased through TCP/IP optimizations. Consistency of throughput also determines what your overall experience is; start-stop behavior caused by jittery performance can readily result in user frustration.
More broadly, the problem is that any method of testing CDNs from anything other than the edge of the network, using real end-user points, is flawed. Keynote and Gomez provide the best approximations on a day to day basis, but they’re only statistical samples. Gomez’s “Actual Experience” service uses an end-user panel, but that introduces uncontrolled variables into the mix if you’re trying to compare CDNs, and it’s still only sampling.
The holy grail of CDN measurement, of course, is seeing performance in real-time — knowing exactly what users are getting at any given instant from any particular geography. But even if a real-time analytics platform existed, you’d still have to try a bunch of different CDNs to know how they’d perform for your particular situation.
Bottom line: If you want to really test a CDN’s performance, and see what it will do for your content and your users, you’ve got to run a trial.
Then, once you’ve done your trials, you’ve got to look at the performance and the cost numbers, and then ask yourself: What is the business value of performance to me? Does better performance drive real value for you? You need to measure more than just the raw performance — you need to look at time spent on your site, conversion rate, basket value, page views, ad views, or whatever it is that tells you how successful your site is. Then you can make an intelligent decision.
In the end, discussions of CDN architecture are academically interesting, and certainly of practical interest to engineers in the field, but if you’re buying CDN services, architecture is only relevant to you insofar as it results in the quality of the user experience. If you’re a buyer, don’t get dragged into the rathole that is debating the merits of one architecture versus another. Look at real-world performance, and think short-term; CDN contract lengths are getting shorter and shorter, and if you’re a high-volume buyer, what you care about is performance right now and maybe in the next year.