Blog Archives
Amazon’s CloudFront CDN
Amazon’s previously-announced CDN service is now live. It’s called CloudFront. It was announced on the Amazon Web Services blog, and discussed in a blog post by Amazon’s CTO. The RightScale blog has a deeper look, too.
How CloudFront Works
Basically, to use the CloudFront CDN, you drop your static objects (your static HTML, images, JavaScript libraries, etc.) into an S3 bucket. (A bucket is essentially the conceptual equivalent of a folder.) Then, you register that bucket with CloudFront. Once you’ve done this, you get back a hostname that you use for as the base of the URL to your static objects. The hostname looks to be a hash in the cloudfront.net domain; I expect CloudFront customers will normally create a CNAME (alias) in their own domain that points to it.
The tasks use Amazon API calls, but like every other useful bit of AWS functionality, there are tools out there that can make it point-and-click, so this can readily be CDN for Dummies. There’s no content “publication”, just making sure that the URLs for your static content point to Amazon. Intelligently-designed sites already have static content segregated onto its own hostname, so it’s a simple switch for them; for everyone else, it’s just some text editing, and it’s the same method as just about every other CDN for the better part of the last decade.
In short, you’re basically using S3 as the origin server. The actual delivery is through one of 14 edge locations — in the particular case of these locations, a limited footprint, but not a bad one in these days of the megaPOP CDN.
How Pricing Works
Pricing for the service is a little more complex to understand. If your content is being served from the edge, it’s priced similarly to S3 data transfers (same price at 10 TB or less, and 1 cent less for higher tiers). However, before your content can get served off the edge, it has to get there. So if the edge has to go back to the origin to get the content (i.e., the content is not in cache or has expired from cache), you’ll also incur a normal S3 charge.
You can think of this as being similar to the way that Akamai charged a few years ago — you’d pay a bandwidth fee for delivering content to users, but you’d also pay something called an “administrative bandwidth fee”, which was charged for edge servers fetching from the origin. That essentially penalized you when there was a cache miss. On a big commercial CDN like Akamai, customers reasonably felt this was sort of unfair, competitors like Mirror Image hit back by not charging those fees, and by and large the fee disappeared from the market.
It makes sense for Amazon to charge for having to go back to the (S3) origin, though, because the open-to-all-comers nature of CloudFront means that they would naturally have a ton of customers who have long-tail content and therefore very low cache hit ratios. On a practical basis, however, quite a few people who try out CloudFront will probably discover that the cache miss ratio for their content is too high; since a cache miss also confers no performance benefit, the higher delivery cost won’t make sense, and they’ll go back to just using S3 itself for static delivery. In other words, the pricing scheme will make customers self-regulate, over time.
Performance and CDN Server Misdirection
Some starting fodder for the performance-minded: My preliminary testing shows that Amazon’s accuracy in choosing an appropriate node can be extremely poor. I’ve noted previously that the nameserver is the location reference, and so the CDN’s content-routing task is to choose a content server close to that location (which will also hopefully be close to the end-user).
I live in suburban DC. I tested using cdn.wolfire.com (a publicly-cited CloudFront customer), and checked some other CloudFront customers as well to make sure it wasn’t a domain-specific aberration. My local friend on Verizon FIOS, using a local DC nameserver, gets Amazon’s node in Northern Virginia (suburban DC), certainly the closest node to us. Using the UltraDNS nameserver in the New York City area, I also get the NoVa node — but from the perspective of that nameserver, Amazon’s Newark node should actually be closer. Using the OpenDNS nameserver in NoVa, Amazon tries to send me to its CDN node in Palo Alto (near San Francisco) — not even close. My ISP, MegaPath, has a local nameserver in DC; using that, Amazon sends me to Los Angeles.
That’s a serious set of errors. By ping time, the correct, NoVa node is a mere 5 ms away from me. The LA node is 71 ms from me. Palo Alto is even worse, at 81 ms. Both of those times are considerably worse than pure cross-country latency across a decent backbone. My ping to the test site’s origin (www.wolfire.com) is only 30 ms, making the CDN latency more than twice as high.
However, I have a certain degree of faith: Over time, Amazon has clearly illustrated that they learn from their mistakes, they fix them, and they improve on their products. I assume they’ll figure out proper redirection in time.
What It Means for the Market
Competitively, it seems like Rackspace’s Cloud Files plus Limelight may turn out to be the stronger offering. The price of Rackspace/Limelight is slightly higher, but apparently there’s no origin retrieval charge, and Limelight has a broader footprint and therefore probably better global performance (although there are many things that go into CDN performance beyond footprint). And while anyone can use CloudFront for the teensiest bit of delivery, the practical reality is that they’re not going to — the pricing scheme will make that irrational.
Some people will undoubtedly excitedly hype CloudFront as a game-changer. It’s not. It’s certainly yet another step towards having ubiquitous edge delivery of all popular static content, and the prices are attractive at low to moderate volumes, but high-volume customers can and will get steeper discounts from established players with bigger footprints and a richer feature set. It’s a logical move on Amazon’s part, and a useful product that’s going to find a large audience, but it’s not going to majorly shake up the CDN industry, other than to accelerate the doom of small undifferentiated providers who were already well on their way to rolling into the fiery pit of market irrelevance and insolvency.
(I can’t write a deeper market analysis on my blog. If you’re a Gartner client and want to know more, make an inquiry and I’d be happy to discuss it.)
What Rackspace’s cloud moves mean
Last week, Rackspace made a bunch of announcements about its cloud strategy. I wrote previously about its deal with Limelight; now I want to contemplate its two acquisitions, Jungle Disk and Slicehost. (I have been focused on writing research notes in the last week, or I would have done this sooner…)
Jungle Disk provides online storage, via Amazon S3. Its real strength is in its easy-to-use interface; you can make your Jungle Disk storage look like a network drive, it has automated backup into the cloud, and there are premium features like Web-based (WebDAV) access. Files are store encrypted. You pay for their software, then pay the S3 charges; there’s only a monthly recurring from them if you get their “plus” service. The S3 account is yours, so if you decide to dump Jungle Disk, you can keep using your storage.
The Jungle Disk acquisition looks like a straightforward feature addition — it’s a value-add for Rackspace’s Cloud Files offering, and Rackspace has said that Jungle Disk will be offering storage on both platforms. It’s a popular brand in the S3 backup space, and it’s a scrappy little self-funded start-up.
I suspect Rackspace likes scrappy little self-funded start-ups. The other acquisition, Slicehost, is also one. At this point, outright buying smart and ambitious entrepreneurial engineers with cloud experience is not a bad plan for Rackspace, whose growth has already resulted in plenty of hiring challenges.
Slicehost is a cloud hosting company. They offer unmanaged Linux instances on a Xen-based platform; their intellectual property comes in the form of their toolset. What’s interesting about this acquisition is that this kind of “server rental” — for $X per month, you can get server hardware (except this time it’s virtual rather than physical) — is actually akin to Rackspace’s old ServerBeach business (sold to Peer 1 back in 2004), not to Rackspace’s current managed hosting business.
Rackspace got out of the ServerBeach model because it was fundamentally different from their “fanatical support” desires, and because it has much less attractive returns on invested capital. The rental business offers a commodity at low prices, where you hope that nobody calls you because that’s going to eat your margin on the deal; you are ultimately just shoving hardware at the customer. What Rackspace’s managed hosting customers pay for is to have their hands held. The Slicehost model is the opposite of that.
Cloud infrastructure providers, hope, of course, that they’ll be able to offer enough integrated value-adds on top of the raw compute to earn higher margins, and gain greater stickiness. It’s clear that Rackspace wants to be a direct competitor to Amazon (and companies like Joyent). Now the question is exactly how they’re going to reconcile that model with the fanatical support model, not to mention their ROIC model.
Amazon EC2 comes out of beta
Amazon made a flurry of EC2 announcements today.
First off, EC2 is now out of beta, which means that there’s now a service-level agreement. It’s a 99.95% SLA, where downtime is defined as two or more Availability Zones within the same region, in which you are running instances, are unavailable (your running instances have no external connectivity and you can’t launch new instances that do). Since EC2 only has one region right now, for practical purposes, that means “I have disconnected instances in at least two zones”. That pretty much implies that Amazon thinks that if you care enough to want an SLA, you ought to care enough to be running your instances in at least two zones.
Note that the 99.95% SLA is at least as good as what you’d get out of a typical dedicated hosting provider for an HA/load-balanced solution. (Non-HA dedicated solutions usually get you an SLA in the 99.50-99.75% range.) Hosting SLAs are typically derived primarily from the probability of hardware failure, in conjunction with facility failure, and thus should be broadly realistic. This suggests that Amazon’s SLA is probably a mathematically realistic one. I’d expect that catastrophic failures would be rooted in the EC2 software itself, as with the July S3 outage.
Second, the previously-announced Windows and Microsoft SQL Server AMIs are going into beta. These instances are more expensive than the Linux ones — from a price differential of $0.10 for Linux vs. $0.125 for Windows on the small instances, up to a whopping $0.80 for Linux vs. $1.20 for Windows on the largest high-CPU instance. That’s the difference between $72 and $90, or $576 and $874, over a month of full-time running. On a percentage basis, this is broadly consistent with the price differential between Windows and Linux VPS hosting.
Third, Amazon announced plans to offer a management console, monitoring, load balancing, and automatic scaling. That’s going to put it in direct competition with vendors who offer EC2 overlays, like Rightscale. That is not going to come as a surprise to those vendors, most of whom intend to be cloud-agnostic, with their value-add being providing a single consistent interface across multiple clouds. So in some ways, Amazon’s new services, which will also be directly API supported, will actually make life easier for those vendors — it just raises the bar for what value-added features they need.
The management console is a welcome addition, as anyone who has ever attempted to provision through the API and its wrapper scripts will undoubtedly attest. It’s always been an unnecessary level of pain, and the management console doesn’t need to do much of anything to be an improvement over that. People have been managing their own EC2 monitoring just fine, but having Amazon’s view, integrated into the management console, will be a nice plus. (But monitoring itself is an enabling technology for other services; see below.)
There’s never really been a great way to load-balance on EC2. DNS round-robin is crude, and running a load-balancing proxy creates a single point of failure. Native, smart load-balancing would be a boon; here’s a place where Amazon could deliver some great value-adds that are worth paying extra for.
Automatic scaling has been one of the key missing pieces of EC2. Efforts like Scalr have been an attempt to address it, and it’s going to be interesting to see how sophisticated the Amazon native offering will be.
Note that three of these new EC2 elements go together. Implicit in both load-balancing and automatic scaling is the need to be able to monitor instances. The more complete the instrumentation, the smarter the load-balancing and scaling decisions can be.
For a glimpse at the way Amazon is thinking about the interlinkages, check out Amazon CTO’s blog post on Amazon’s efficiency principles.
Heavy experiments with Amazon
Scott Penberthy of online video provider Heavy has an interesting blog post about trying to replace Rackspace and Akamai with Amazon web services — substituting S3 for Rackspace SAN storage, and direct delivery out of S3 for Akamai CDN services. Not surprisingly, the S3 performance fell well below Akamai performance, but they managed to achieve significant storage cost savings.
Who hosts Warhammer Online?
With the recent launch of EA/Mythic‘s Warhammer Online MMORPG, comes my usual curiosity about who’s providing the infrastructure.
Mythic has stated publicly that all of the US game servers are located in Virginia, near Mythic’s offices. A couple of traceroutes seem to indicate that they’re in Verizon, almost certainly in colocation (managed hosting is rare for MMOGs), and seem to have purely Verizon connectivity to the Internet. The webservers, on the other hand, look to be split between Verizon, and ThePlanet in Dallas. FileBurst (a single-location download hosting service) is used to serve images and cinematics.
During the beta, Mythic used BitTorrent to serve files. With the advent of full release, it doesn’t appear that they’re depending on peer-to-peer any longer — unlike Blizzard, for instance, which uses public P2P in the form of BitTorrent for its World of Warcraft updates, trading off cost with much higher levels of user frustration. MMO updates are probably an ideal case for P2P file distribution — Solid State Networks, a P2P CDN, has done well by that — and with hybrid CDNs (those combining a traditional distributed model with P2P) becoming more commonplace, I’d expect to see that model more often.
However, I’m not keen on either single data center locations or single-homing, for anything that wants to be reliable. I also believe that gaming — a performance-sensitive application — really ought to run in a multi-homed environment. My favorite “why you should use multiple ISPs, even if you’re using a premium ISP that you love” anecdote to my clients is an observation I made while playing World of Warcraft a few years ago. WoW originally used just AT&T’s network (in AT&T colocation). Latency was excellent — most of the time. Occasionally, you’d get a couple of seconds of network burp, where latency would spike hugely. If you’re websurfing, this doesn’t really impact your experience. If you’re playing an online game, you can end up dead. When WoW switched to Internap for the network piece (remaining in AT&T colo), overall latencies went up — but the latencies were still well below the threshold of problematic performance, and more importantly, the latencies were rock-solidly in a narrow window of variability. (This is the same reason multi-homed CDNs with lots of route diversity deliver better consistency of user experience than single-carrier CDNs.)
Companies like Fileburst, by the way, are going to be squarely in the crosshairs of the forthcoming Amazon CDN. Fileburst will do 5 TB of delivery at $0.80 per GB — $3,985/month. At the low end, they’ll do 100 GB or less at $1/GB. The first 100 MB of storage is free, then it’s $2/MB. They’ve got a delivery infrastructure at the Equinix IBX in Ashburn (Northern Virginia, near DC), extensive peering, but any other footprint is vague (they say they have a six-location CDN service, but it’s not clear whether it’s theirs or if they’re reselling).
If Amazon’s CDN pricing is anything like the S3 pricing, they’ll blow the doors off those prices. S3 is $0.15/GB for space and $0.17/GB for the first 10 TB of data transfer. So deliver 5 TB worth of content, out of a 1 GB store, would cost me $5,785/month with Fileburst, and about $850 with Amazon S3. Even if the CDN premium on data transfer is, say, 100%, that’d still be only $1,700 with Amazon.
Amazon has a key cloud trait — elasticity, basically defined as the ability to scale to zero (or near-zero) as easily as scaling to bogglosity. It’s that bottom end that’s really going to give them the potential to wipe out the zillion little CDNs that primarily have low-volume customers.
Oracle in the cloud… sort of
Today’s keynote at Oracle World mentioned that Oracle’s coming to Amazon’s EC2 cloud.
The bottom line is that you can now get some Oracle products, including the Oracle 11g database software, bundled as AMIs (Amazon machine images) for EC2 — i.e., ready-to-deploy — and you can license these products to run in the cloud. Any sysadmin who has ever personally gone through the pain of trying to install an Oracle database from scratch knows how frustrating it can be; I’m curious how much the task has or hasn’t been simplified by the ready-to-run AMIs.
On the plus side, this is going to address the needs of those companies who simply want to move apps into the cloud, without changing much if anything about their architecture and middleware. And it might make a convenient development and testing platform.
But simply putting a database on cloud infrastructure doesn’t make it make it a cloud database. Without that crucial distinction, what are the compelling economics or business value-add? It’s cool, but I’m having difficulty thinking of circumstances under which I would tell a client, yes, you should host your production Oracle database on EC2, rather than getting a flexible utility hosting contract with someone like Terremark, AT&T, or Savvis.
Amazon gets into the CDN business
Unsurprisingly, Amazon is getting into the CDN business. (They’re taking notification sign-ups but it’s still in private beta.)
Content delivery is a natural complement to S3 and EC2. There’s already been use and abuse of S3 as a “ghetto CDN”, and at least one commercial hosting provider (Voxel) already offers a productized S3-based CDN. If you’re an EC2 or an S3 customer, chances are high that you’ve got significant static content traffic suited to CDN delivery. Amazon is just gluing together the logical pieces, and like you’d expect, your content on their CDN will reside in S3.
Basic content delivery services can practically be thought of as nothing more than value-added bandwidth (or value-added storage, if you want to think of it that way). Chances are very high that every major carrier, not to mention every major provider of distributed computing services (i.e., infrastructure clouds), is going to end up in the CDN business sooner or later.
GigaOm and Dan Rayburn have more details about the announcement, and come to similar conclusions: Despite how badly the stock market is beating up on Akamai in the wake of this announcement, this really has very little impact on them. I concur with that bottom line.
I noted last year that the CDN market has bifurcated. Amazon’s new offering is going to squarely target the commoditized portion of the market. Of the existing CDNs, it will impact Level 3 and the smaller no-frills CDNs the most. It will probably also have a minor impact on Limelight (which has a significant percentage of commodity CDN traffic), but basically negligible impact upon Akamai, whose customer base is tilting more and more to the high end of this business.
Just like EC2 and S3 have, this new Amazon service is also going to create a market for overlay value-add companies — people who provide easier-to-use interfaces, analytics, and so on, over the Amazon offering. I’d expect to see some of the existing overlay companies provide management toolsets for the new service, and it will probably prompt some hosters to offer CDN services built on top of the Amazon platform.
Amazon’s entry, combining an elastic model with what at this point can reasonably be considered proven scalable infrastructure expertise, constitutes further market expansion, and supports my fundamental belief that CDNs are increasingly going to entirely dominate the front-end webserving tier. Delivery is becoming so cheap for the masses that there’s very little reason to bother with your own front-end infrastructure.