Category Archives: Infrastructure

Amazon’s CloudFront CDN

Amazon’s previously-announced CDN service is now live. It’s called CloudFront. It was announced on the Amazon Web Services blog, and discussed in a blog post by Amazon’s CTO. The RightScale blog has a deeper look, too.

How CloudFront Works

Basically, to use the CloudFront CDN, you drop your static objects (your static HTML, images, JavaScript libraries, etc.) into an S3 bucket. (A bucket is essentially the conceptual equivalent of a folder.) Then, you register that bucket with CloudFront. Once you’ve done this, you get back a hostname that you use for as the base of the URL to your static objects. The hostname looks to be a hash in the cloudfront.net domain; I expect CloudFront customers will normally create a CNAME (alias) in their own domain that points to it.

The tasks use Amazon API calls, but like every other useful bit of AWS functionality, there are tools out there that can make it point-and-click, so this can readily be CDN for Dummies. There’s no content “publication”, just making sure that the URLs for your static content point to Amazon. Intelligently-designed sites already have static content segregated onto its own hostname, so it’s a simple switch for them; for everyone else, it’s just some text editing, and it’s the same method as just about every other CDN for the better part of the last decade.

In short, you’re basically using S3 as the origin server. The actual delivery is through one of 14 edge locations — in the particular case of these locations, a limited footprint, but not a bad one in these days of the megaPOP CDN.

How Pricing Works

Pricing for the service is a little more complex to understand. If your content is being served from the edge, it’s priced similarly to S3 data transfers (same price at 10 TB or less, and 1 cent less for higher tiers). However, before your content can get served off the edge, it has to get there. So if the edge has to go back to the origin to get the content (i.e., the content is not in cache or has expired from cache), you’ll also incur a normal S3 charge.

You can think of this as being similar to the way that Akamai charged a few years ago — you’d pay a bandwidth fee for delivering content to users, but you’d also pay something called an “administrative bandwidth fee”, which was charged for edge servers fetching from the origin. That essentially penalized you when there was a cache miss. On a big commercial CDN like Akamai, customers reasonably felt this was sort of unfair, competitors like Mirror Image hit back by not charging those fees, and by and large the fee disappeared from the market.

It makes sense for Amazon to charge for having to go back to the (S3) origin, though, because the open-to-all-comers nature of CloudFront means that they would naturally have a ton of customers who have long-tail content and therefore very low cache hit ratios. On a practical basis, however, quite a few people who try out CloudFront will probably discover that the cache miss ratio for their content is too high; since a cache miss also confers no performance benefit, the higher delivery cost won’t make sense, and they’ll go back to just using S3 itself for static delivery. In other words, the pricing scheme will make customers self-regulate, over time.

Performance and CDN Server Misdirection

Some starting fodder for the performance-minded: My preliminary testing shows that Amazon’s accuracy in choosing an appropriate node can be extremely poor. I’ve noted previously that the nameserver is the location reference, and so the CDN’s content-routing task is to choose a content server close to that location (which will also hopefully be close to the end-user).

I live in suburban DC. I tested using cdn.wolfire.com (a publicly-cited CloudFront customer), and checked some other CloudFront customers as well to make sure it wasn’t a domain-specific aberration. My local friend on Verizon FIOS, using a local DC nameserver, gets Amazon’s node in Northern Virginia (suburban DC), certainly the closest node to us. Using the UltraDNS nameserver in the New York City area, I also get the NoVa node — but from the perspective of that nameserver, Amazon’s Newark node should actually be closer. Using the OpenDNS nameserver in NoVa, Amazon tries to send me to its CDN node in Palo Alto (near San Francisco) — not even close. My ISP, MegaPath, has a local nameserver in DC; using that, Amazon sends me to Los Angeles.

That’s a serious set of errors. By ping time, the correct, NoVa node is a mere 5 ms away from me. The LA node is 71 ms from me. Palo Alto is even worse, at 81 ms. Both of those times are considerably worse than pure cross-country latency across a decent backbone. My ping to the test site’s origin (www.wolfire.com) is only 30 ms, making the CDN latency more than twice as high.

However, I have a certain degree of faith: Over time, Amazon has clearly illustrated that they learn from their mistakes, they fix them, and they improve on their products. I assume they’ll figure out proper redirection in time.

What It Means for the Market

Competitively, it seems like Rackspace’s Cloud Files plus Limelight may turn out to be the stronger offering. The price of Rackspace/Limelight is slightly higher, but apparently there’s no origin retrieval charge, and Limelight has a broader footprint and therefore probably better global performance (although there are many things that go into CDN performance beyond footprint). And while anyone can use CloudFront for the teensiest bit of delivery, the practical reality is that they’re not going to — the pricing scheme will make that irrational.

Some people will undoubtedly excitedly hype CloudFront as a game-changer. It’s not. It’s certainly yet another step towards having ubiquitous edge delivery of all popular static content, and the prices are attractive at low to moderate volumes, but high-volume customers can and will get steeper discounts from established players with bigger footprints and a richer feature set. It’s a logical move on Amazon’s part, and a useful product that’s going to find a large audience, but it’s not going to majorly shake up the CDN industry, other than to accelerate the doom of small undifferentiated providers who were already well on their way to rolling into the fiery pit of market irrelevance and insolvency.

(I can’t write a deeper market analysis on my blog. If you’re a Gartner client and want to know more, make an inquiry and I’d be happy to discuss it.)

Bookmark and Share

Cloud “overcapacity”

Investors keep gnawing persistently at the issue of data center overcapacity and the colo market, but it doesn’t look like too many people are thinking about potential overcapacity in the cloud infrastructure market and what that means in the medium-term, economic-downturn scenario.

Many of the emerging cloud infrastructure service providers (“cloud hosters”) are building substantial chunks of capacity in order to accomodate a bunch of Web 2.0 start-ups with unpredictable and sometimes wildly variable capacity needs. (The now-classic example is Animoto.)

If this sounds naggingly familiar, it should. In the ’90s, colo providers built out tons of capacity in anticipation of the growth of the dot-coms. That growth was real enough until VCs stopped pouring money into companies that had no real revenues, causing the whole house of cards to collapse — the dot-coms folded and took many colocation companies with them. (The story is more complex than that, but that’s a succinct enough summary.)

Amazon can presumably afford to have gargantuan capacity reserves in EC2, because Amazon’s got heavily seasonal traffic that leaves it with gobs of spare capacity post-Christmas, meaning that anything it can do to get revenue from that pile of hardware is gravy. As long as EC2 keeps growing, every holiday season it can pile on more hardware and then release it thereafter to be absorbed into the EC2 pool.

Other cloud providers don’t have that luxury. And renting unmanaged hardware at commodity prices has ugly returns on invested capital, and the risk profile is much higher than it is with data center leasing or colo. Most cloud providers are trying not to overbuy, but running out of capacity also has its risks, especially when it looks like the heavens will be raining failed start-ups as they run out of cash.

Bookmark and Share

Does architecture matter?

A friend of mine, upon reading my post on the cloud skills shift, commented that he thought that the role of the IT systems architect was actually diminishing in the enterprise. (He’s an architect at a media company.)

His reasoning was simple: Hardware has become so cheap that IT managers no longer want to spend staff time performance-tuning, finding just the right configuration, or getting the sizing and capacity planning at its most efficient, any longer.

Put another way: Is it cheaper to have a senior operations engineer on staff, or is it cheaper to just buy more hardware?

The one-size-fits-all nature of cloud may very well indicate the latter, for organizations for whom cutting-edge technology expertise does not drive competitive advantage.

Bookmark and Share

The discipline of cloud

Cloud forces configuration management discipline.

As we shift more and more towards provisioning from images, rather than building operating systems from scratch, installing packages, and configuring everything, we move towards the holistic build becoming the norm — essentially, the virtual appliance. Tools companies like rPath and Elastra are taking slices of what should probably be part of broader run-book automation (RBA) solutions that embrace the cloud.

It represents a big shift in thinking for the enterprise. Dot-coms have long lived in the world of cloning being the provisioning norm, and have for years, because they’ve got horizontally-scalable apps for which they build servers by the pallet-load. Enterprises mostly haven’t made that shift yet, because most of what the enterprise is doing is still the one-off application that if you’re lucky, you will get them to deliver a server for in a couple of weeks, and if you’re not lucky, you’ll get sometime in the next nine months. In the dot-com world, it is not acceptable to have gestating an operational environment to take as long as gestating a human.

And that means that the enterprise is going to have to get out of doing the one-off, building machines from scratch, and letting app developers change things on the fly.

Bookmark and Share

What Rackspace’s cloud moves mean

Last week, Rackspace made a bunch of announcements about its cloud strategy. I wrote previously about its deal with Limelight; now I want to contemplate its two acquisitions, Jungle Disk and Slicehost. (I have been focused on writing research notes in the last week, or I would have done this sooner…)

Jungle Disk provides online storage, via Amazon S3. Its real strength is in its easy-to-use interface; you can make your Jungle Disk storage look like a network drive, it has automated backup into the cloud, and there are premium features like Web-based (WebDAV) access. Files are store encrypted. You pay for their software, then pay the S3 charges; there’s only a monthly recurring from them if you get their “plus” service. The S3 account is yours, so if you decide to dump Jungle Disk, you can keep using your storage.

The Jungle Disk acquisition looks like a straightforward feature addition — it’s a value-add for Rackspace’s Cloud Files offering, and Rackspace has said that Jungle Disk will be offering storage on both platforms. It’s a popular brand in the S3 backup space, and it’s a scrappy little self-funded start-up.

I suspect Rackspace likes scrappy little self-funded start-ups. The other acquisition, Slicehost, is also one. At this point, outright buying smart and ambitious entrepreneurial engineers with cloud experience is not a bad plan for Rackspace, whose growth has already resulted in plenty of hiring challenges.

Slicehost is a cloud hosting company. They offer unmanaged Linux instances on a Xen-based platform; their intellectual property comes in the form of their toolset. What’s interesting about this acquisition is that this kind of “server rental” — for $X per month, you can get server hardware (except this time it’s virtual rather than physical) — is actually akin to Rackspace’s old ServerBeach business (sold to Peer 1 back in 2004), not to Rackspace’s current managed hosting business.

Rackspace got out of the ServerBeach model because it was fundamentally different from their “fanatical support” desires, and because it has much less attractive returns on invested capital. The rental business offers a commodity at low prices, where you hope that nobody calls you because that’s going to eat your margin on the deal; you are ultimately just shoving hardware at the customer. What Rackspace’s managed hosting customers pay for is to have their hands held. The Slicehost model is the opposite of that.

Cloud infrastructure providers, hope, of course, that they’ll be able to offer enough integrated value-adds on top of the raw compute to earn higher margins, and gain greater stickiness. It’s clear that Rackspace wants to be a direct competitor to Amazon (and companies like Joyent). Now the question is exactly how they’re going to reconcile that model with the fanatical support model, not to mention their ROIC model.

Bookmark and Share

Cloud risks and organizational culture

I’ve been working on a note about Amazon EC2, and pondering how different the Web operations culture of Silicon Valley is from that of the typical enterprise IT organization.

Silicon Valley’s prevailing Ops culture is about speed. There’s a desperate sense of urgency that seems to prevail there, a relentless expectation that you can be the Next Big Thing, if only you can get there fast enough. Or, alternatively, you are the Current Big Thing, and it is all you can do to keep up with your growth, or at least not have the Out Of Resources truck run right over you.

Enterprise IT culture tends to be about risk mitigation. It is about taking your time, being thorough, and making the right decisions and ensuring that nothing bad happens as the result of them.

To techgeeks at start-ups in the Valley (and I mean absolutely no disparagement by this, as I was one, and perhaps still would be, if I hadn’t become an analyst), the promise and usefulness of cloud computing is obvious. The question is not if; it is when — when can I buy a cloud that has the particular features I need to make my life easier? But: Simplify my architecture? Solve my scaling problems and improve my availability? Give me infrastructure the instant I need it, and charge me only when I get it? I want it right now. I wanted it yesterday, I wanted it last year. Got a couple of problems? Hey, everyone makes mistakes; just don’t make them twice. If I’d done it myself, I’d have made mistakes too; anyone would have. We all know this is hard. No SLA? Just fix it as quickly as you can, and let me know what went wrong. It’s not like I’m expecting you to go to Tahiti while my infrastructure burns; I know you’ll try your best. Sure, it’s risky, but heck, my whole business is a risk! No guts, no glory!

Your typical enterprise IT guy is struck aghast by that attitude. He does not have the problem of waking up one morning and discovering that his sleepy little Facebook app has suddenly gotten the attention of teenyboppers world-wide and now he needs a few hundred or a few thousand servers right this minute, while he prays that his application actually scales in a somewhat linear fashion. He’s not dealing with technology he’s built himself that might or might not work. He isn’t pushing the limits and having to call the vendor to report an obscure bug in the operating system. He isn’t being asked to justify his spending to the board of directors. He lives in a world of known things — budgets worked out a year in advance, relatively predictable customer growth, structured application development cycles stretched out over months, technology solutions that are thoroughly supported by vendors. And so he wants to try to avoid introducing unknowns and risks into his environment.

Despite eight years at Gartner, advising clients that are mostly fairly conservative in their technology decisions, I still find myself wanting to think in early-adopter mode. In trying to write for our clients, I’m finding it hard to shift from that mode. It’s not that I’m not skeptical about the cloud vendors (and I’m trying to be hands-on with as many platforms as I can, so I can get some first-hand understanding and a reality check). It’s that I am by nature rooted in that world that doesn’t care as much about risk. I am interested in reasonable risk versus the safest course of action.

Realistically, enterprises are going to adopt cloud infrastructure in a very different way and at a very different pace than fast-moving technology start-ups. At the moment, few enterprises are compelled towards that transformation in the way that the Web 2.0 start-ups are — their existing solutions are good enough, so what’s going to make them move? All the strengths of cloud infrastructure — massive scalability, cost-efficient variable capacity, Internet-readiness — are things that most enterprises don’t care about that much.

That’s the decision framework I’m trying to work out next.

I am actively interested in cloud infrastructure adoption stories, especially from “traditional” enterprises who have made the leap, even in an experimental way. If you’ve got an experience to share, using EC2, Joyent, Mosso, EngineYard, Terremark’s Infinistructure, etc., I’d love to hear it, either in a comment on my blog or via email at lydia dot leong at gartner dot com.

Bookmark and Share

The Microsoft hybrid-P2P CDN study

I noted previously that the Microsoft CDN study, titled “Measuring and Evaluating Large-Scale CDNs”, had disappeared. Now its lead author, Microsoft researcher Cheng Huang, has updated his home page to note that the study has been withdrawn.

Also removed from his home page, but still available from one of his collaborators, is a study from earlier this year, “Understanding Hybrid CDN-P2P: Why Limelight Needs Its Own Red Swoosh“. I assume the link was removed because it extensively details the CDN discovery methodology also used in the more recent Microsoft CDN study, so if you missed reading the study while it was available, you might want to read this slightly older paper for the details.

I just read the P2P study, which reveals something that I conjectured in my earlier analysis of the study’s blind spots: the visibility into Verizon was almost non-existent. The P2P study asserts that Akamai is present in just four locations inside Verizon’s network. This seems improbable. Verizon is Akamai’s most significant carrier reseller and one of its largest enterprise-focused resellers. It is also one of the largest broadband networks in the United States, and is a significant global network service provider. It was also a close partner of Netli, who inked a deal making Verizon its primary source of bandwidth; I would expect that even though Akamai integrated Netli into its network after acquiring it, it would have kept any strategic points of presence in Verizon’s network. One would have expected that the researchers would have wondered what the chances were that a close partner wouldn’t have substantial Akamai footprint, especially when their chart of Limelight indicated 10 Verizon locations. (Remember that the charting methodology is much less accurate for a deep-footprint CDN.)

The researchers then go on to explore the effects of hybrid P2P using those Verizon nodes (along with AT&T, which also looks like an incomplete discovery). Unfortunately, they don’t tell us much of value about peer-assisted offload; the real world has made it amply clear that actual P2P effectiveness depends tremendously on the nature of your content and your audience.

The methodological flaws make the hybrid-P2P paper’s conclusions deeply and fundamentally flawed. But like the other study, it is an interesting read.

Bookmark and Share

Amazon EC2 comes out of beta

Amazon made a flurry of EC2 announcements today.

First off, EC2 is now out of beta, which means that there’s now a service-level agreement. It’s a 99.95% SLA, where downtime is defined as two or more Availability Zones within the same region, in which you are running instances, are unavailable (your running instances have no external connectivity and you can’t launch new instances that do). Since EC2 only has one region right now, for practical purposes, that means “I have disconnected instances in at least two zones”. That pretty much implies that Amazon thinks that if you care enough to want an SLA, you ought to care enough to be running your instances in at least two zones.

Note that the 99.95% SLA is at least as good as what you’d get out of a typical dedicated hosting provider for an HA/load-balanced solution. (Non-HA dedicated solutions usually get you an SLA in the 99.50-99.75% range.) Hosting SLAs are typically derived primarily from the probability of hardware failure, in conjunction with facility failure, and thus should be broadly realistic. This suggests that Amazon’s SLA is probably a mathematically realistic one. I’d expect that catastrophic failures would be rooted in the EC2 software itself, as with the July S3 outage.

Second, the previously-announced Windows and Microsoft SQL Server AMIs are going into beta. These instances are more expensive than the Linux ones — from a price differential of $0.10 for Linux vs. $0.125 for Windows on the small instances, up to a whopping $0.80 for Linux vs. $1.20 for Windows on the largest high-CPU instance. That’s the difference between $72 and $90, or $576 and $874, over a month of full-time running. On a percentage basis, this is broadly consistent with the price differential between Windows and Linux VPS hosting.

Third, Amazon announced plans to offer a management console, monitoring, load balancing, and automatic scaling. That’s going to put it in direct competition with vendors who offer EC2 overlays, like Rightscale. That is not going to come as a surprise to those vendors, most of whom intend to be cloud-agnostic, with their value-add being providing a single consistent interface across multiple clouds. So in some ways, Amazon’s new services, which will also be directly API supported, will actually make life easier for those vendors — it just raises the bar for what value-added features they need.

The management console is a welcome addition, as anyone who has ever attempted to provision through the API and its wrapper scripts will undoubtedly attest. It’s always been an unnecessary level of pain, and the management console doesn’t need to do much of anything to be an improvement over that. People have been managing their own EC2 monitoring just fine, but having Amazon’s view, integrated into the management console, will be a nice plus. (But monitoring itself is an enabling technology for other services; see below.)

There’s never really been a great way to load-balance on EC2. DNS round-robin is crude, and running a load-balancing proxy creates a single point of failure. Native, smart load-balancing would be a boon; here’s a place where Amazon could deliver some great value-adds that are worth paying extra for.

Automatic scaling has been one of the key missing pieces of EC2. Efforts like Scalr have been an attempt to address it, and it’s going to be interesting to see how sophisticated the Amazon native offering will be.

Note that three of these new EC2 elements go together. Implicit in both load-balancing and automatic scaling is the need to be able to monitor instances. The more complete the instrumentation, the smarter the load-balancing and scaling decisions can be.

For a glimpse at the way Amazon is thinking about the interlinkages, check out Amazon CTO’s blog post on Amazon’s efficiency principles.

Bookmark and Share

Rackspace’s deal with Limelight

Rackspace announced yesterday, as part of a general unveiling of its cloud strategy, a new partnership with Limelight Networks.

Under the new partnership, customers of Rackspace’s Cloud Files (formerly CloudFS) service — essentially, a competitor to Amazon S3 — will be able to choose to publish and deliver their files via Limelight’s CDN. Essentially, this will place Rackspace/Limelight in direct competition with Amazon’s forthcoming S3 CDN.

CDN delivery won’t cost Cloud Files customers any more than Rackspace’s normal bandwidth costs for Cloud Files. Currently, that’s $0.22/GB for the first 5 TB, scaling down to $0.15/GB for volumes above 50 TB. Amazon S3, by comparison, is $0.17/GB for the first 10 TB, down to $0.10/GB for volumes over 150 TB; we don’t yet know what its CDN upcharge, if any, will be. As another reference point, Internap resold via SoftLayer is $0.20/GB, so we can probably take that as a reasonable benchmark for the base entry cost of CDN services sold without any commit.

It’s a reasonably safe bet that Limelight’s CDN is going to deliver better performance than Amazon’s S3 CDN, given its broader footprint and peering relationships, so the usual question of, “What’s the business value of performance?” will apply.

It’s a smart move on Rackspace’s part, and an easy way into a CDN upsell strategy for its regular base of hosting customers, too. And it’s a good way for Limelight to pre-emptively compete against the Amazon S3 CDN.

Bookmark and Share

Rackspace buys itself some cloud

Rackspace’s cloud event resulted in a very significant announcement: the acquisition of Slicehost and Jungle Disk. There’s also an announced Limelight partnership (unknown at the moment what this means, as the two companies already have a relationship), and a Sonian partnership to offer email archiving to Rackspace’s Mailtrust hosted email business.

My gut reaction: Very interesting moves. Signals an intent to be much more aggressive in the cloud space than I think most people were expecting.

Bookmark and Share