Monthly Archives: October 2008
The discipline of cloud
Cloud forces configuration management discipline.
As we shift more and more towards provisioning from images, rather than building operating systems from scratch, installing packages, and configuring everything, we move towards the holistic build becoming the norm — essentially, the virtual appliance. Tools companies like rPath and Elastra are taking slices of what should probably be part of broader run-book automation (RBA) solutions that embrace the cloud.
It represents a big shift in thinking for the enterprise. Dot-coms have long lived in the world of cloning being the provisioning norm, and have for years, because they’ve got horizontally-scalable apps for which they build servers by the pallet-load. Enterprises mostly haven’t made that shift yet, because most of what the enterprise is doing is still the one-off application that if you’re lucky, you will get them to deliver a server for in a couple of weeks, and if you’re not lucky, you’ll get sometime in the next nine months. In the dot-com world, it is not acceptable to have gestating an operational environment to take as long as gestating a human.
And that means that the enterprise is going to have to get out of doing the one-off, building machines from scratch, and letting app developers change things on the fly.
What Rackspace’s cloud moves mean
Last week, Rackspace made a bunch of announcements about its cloud strategy. I wrote previously about its deal with Limelight; now I want to contemplate its two acquisitions, Jungle Disk and Slicehost. (I have been focused on writing research notes in the last week, or I would have done this sooner…)
Jungle Disk provides online storage, via Amazon S3. Its real strength is in its easy-to-use interface; you can make your Jungle Disk storage look like a network drive, it has automated backup into the cloud, and there are premium features like Web-based (WebDAV) access. Files are store encrypted. You pay for their software, then pay the S3 charges; there’s only a monthly recurring from them if you get their “plus” service. The S3 account is yours, so if you decide to dump Jungle Disk, you can keep using your storage.
The Jungle Disk acquisition looks like a straightforward feature addition — it’s a value-add for Rackspace’s Cloud Files offering, and Rackspace has said that Jungle Disk will be offering storage on both platforms. It’s a popular brand in the S3 backup space, and it’s a scrappy little self-funded start-up.
I suspect Rackspace likes scrappy little self-funded start-ups. The other acquisition, Slicehost, is also one. At this point, outright buying smart and ambitious entrepreneurial engineers with cloud experience is not a bad plan for Rackspace, whose growth has already resulted in plenty of hiring challenges.
Slicehost is a cloud hosting company. They offer unmanaged Linux instances on a Xen-based platform; their intellectual property comes in the form of their toolset. What’s interesting about this acquisition is that this kind of “server rental” — for $X per month, you can get server hardware (except this time it’s virtual rather than physical) — is actually akin to Rackspace’s old ServerBeach business (sold to Peer 1 back in 2004), not to Rackspace’s current managed hosting business.
Rackspace got out of the ServerBeach model because it was fundamentally different from their “fanatical support” desires, and because it has much less attractive returns on invested capital. The rental business offers a commodity at low prices, where you hope that nobody calls you because that’s going to eat your margin on the deal; you are ultimately just shoving hardware at the customer. What Rackspace’s managed hosting customers pay for is to have their hands held. The Slicehost model is the opposite of that.
Cloud infrastructure providers, hope, of course, that they’ll be able to offer enough integrated value-adds on top of the raw compute to earn higher margins, and gain greater stickiness. It’s clear that Rackspace wants to be a direct competitor to Amazon (and companies like Joyent). Now the question is exactly how they’re going to reconcile that model with the fanatical support model, not to mention their ROIC model.
Cloud risks and organizational culture
I’ve been working on a note about Amazon EC2, and pondering how different the Web operations culture of Silicon Valley is from that of the typical enterprise IT organization.
Silicon Valley’s prevailing Ops culture is about speed. There’s a desperate sense of urgency that seems to prevail there, a relentless expectation that you can be the Next Big Thing, if only you can get there fast enough. Or, alternatively, you are the Current Big Thing, and it is all you can do to keep up with your growth, or at least not have the Out Of Resources truck run right over you.
Enterprise IT culture tends to be about risk mitigation. It is about taking your time, being thorough, and making the right decisions and ensuring that nothing bad happens as the result of them.
To techgeeks at start-ups in the Valley (and I mean absolutely no disparagement by this, as I was one, and perhaps still would be, if I hadn’t become an analyst), the promise and usefulness of cloud computing is obvious. The question is not if; it is when — when can I buy a cloud that has the particular features I need to make my life easier? But: Simplify my architecture? Solve my scaling problems and improve my availability? Give me infrastructure the instant I need it, and charge me only when I get it? I want it right now. I wanted it yesterday, I wanted it last year. Got a couple of problems? Hey, everyone makes mistakes; just don’t make them twice. If I’d done it myself, I’d have made mistakes too; anyone would have. We all know this is hard. No SLA? Just fix it as quickly as you can, and let me know what went wrong. It’s not like I’m expecting you to go to Tahiti while my infrastructure burns; I know you’ll try your best. Sure, it’s risky, but heck, my whole business is a risk! No guts, no glory!
Your typical enterprise IT guy is struck aghast by that attitude. He does not have the problem of waking up one morning and discovering that his sleepy little Facebook app has suddenly gotten the attention of teenyboppers world-wide and now he needs a few hundred or a few thousand servers right this minute, while he prays that his application actually scales in a somewhat linear fashion. He’s not dealing with technology he’s built himself that might or might not work. He isn’t pushing the limits and having to call the vendor to report an obscure bug in the operating system. He isn’t being asked to justify his spending to the board of directors. He lives in a world of known things — budgets worked out a year in advance, relatively predictable customer growth, structured application development cycles stretched out over months, technology solutions that are thoroughly supported by vendors. And so he wants to try to avoid introducing unknowns and risks into his environment.
Despite eight years at Gartner, advising clients that are mostly fairly conservative in their technology decisions, I still find myself wanting to think in early-adopter mode. In trying to write for our clients, I’m finding it hard to shift from that mode. It’s not that I’m not skeptical about the cloud vendors (and I’m trying to be hands-on with as many platforms as I can, so I can get some first-hand understanding and a reality check). It’s that I am by nature rooted in that world that doesn’t care as much about risk. I am interested in reasonable risk versus the safest course of action.
Realistically, enterprises are going to adopt cloud infrastructure in a very different way and at a very different pace than fast-moving technology start-ups. At the moment, few enterprises are compelled towards that transformation in the way that the Web 2.0 start-ups are — their existing solutions are good enough, so what’s going to make them move? All the strengths of cloud infrastructure — massive scalability, cost-efficient variable capacity, Internet-readiness — are things that most enterprises don’t care about that much.
That’s the decision framework I’m trying to work out next.
I am actively interested in cloud infrastructure adoption stories, especially from “traditional” enterprises who have made the leap, even in an experimental way. If you’ve got an experience to share, using EC2, Joyent, Mosso, EngineYard, Terremark’s Infinistructure, etc., I’d love to hear it, either in a comment on my blog or via email at lydia dot leong at gartner dot com.
The Microsoft hybrid-P2P CDN study
I noted previously that the Microsoft CDN study, titled “Measuring and Evaluating Large-Scale CDNs”, had disappeared. Now its lead author, Microsoft researcher Cheng Huang, has updated his home page to note that the study has been withdrawn.
Also removed from his home page, but still available from one of his collaborators, is a study from earlier this year, “Understanding Hybrid CDN-P2P: Why Limelight Needs Its Own Red Swoosh“. I assume the link was removed because it extensively details the CDN discovery methodology also used in the more recent Microsoft CDN study, so if you missed reading the study while it was available, you might want to read this slightly older paper for the details.
I just read the P2P study, which reveals something that I conjectured in my earlier analysis of the study’s blind spots: the visibility into Verizon was almost non-existent. The P2P study asserts that Akamai is present in just four locations inside Verizon’s network. This seems improbable. Verizon is Akamai’s most significant carrier reseller and one of its largest enterprise-focused resellers. It is also one of the largest broadband networks in the United States, and is a significant global network service provider. It was also a close partner of Netli, who inked a deal making Verizon its primary source of bandwidth; I would expect that even though Akamai integrated Netli into its network after acquiring it, it would have kept any strategic points of presence in Verizon’s network. One would have expected that the researchers would have wondered what the chances were that a close partner wouldn’t have substantial Akamai footprint, especially when their chart of Limelight indicated 10 Verizon locations. (Remember that the charting methodology is much less accurate for a deep-footprint CDN.)
The researchers then go on to explore the effects of hybrid P2P using those Verizon nodes (along with AT&T, which also looks like an incomplete discovery). Unfortunately, they don’t tell us much of value about peer-assisted offload; the real world has made it amply clear that actual P2P effectiveness depends tremendously on the nature of your content and your audience.
The methodological flaws make the hybrid-P2P paper’s conclusions deeply and fundamentally flawed. But like the other study, it is an interesting read.
Amazon EC2 comes out of beta
Amazon made a flurry of EC2 announcements today.
First off, EC2 is now out of beta, which means that there’s now a service-level agreement. It’s a 99.95% SLA, where downtime is defined as two or more Availability Zones within the same region, in which you are running instances, are unavailable (your running instances have no external connectivity and you can’t launch new instances that do). Since EC2 only has one region right now, for practical purposes, that means “I have disconnected instances in at least two zones”. That pretty much implies that Amazon thinks that if you care enough to want an SLA, you ought to care enough to be running your instances in at least two zones.
Note that the 99.95% SLA is at least as good as what you’d get out of a typical dedicated hosting provider for an HA/load-balanced solution. (Non-HA dedicated solutions usually get you an SLA in the 99.50-99.75% range.) Hosting SLAs are typically derived primarily from the probability of hardware failure, in conjunction with facility failure, and thus should be broadly realistic. This suggests that Amazon’s SLA is probably a mathematically realistic one. I’d expect that catastrophic failures would be rooted in the EC2 software itself, as with the July S3 outage.
Second, the previously-announced Windows and Microsoft SQL Server AMIs are going into beta. These instances are more expensive than the Linux ones — from a price differential of $0.10 for Linux vs. $0.125 for Windows on the small instances, up to a whopping $0.80 for Linux vs. $1.20 for Windows on the largest high-CPU instance. That’s the difference between $72 and $90, or $576 and $874, over a month of full-time running. On a percentage basis, this is broadly consistent with the price differential between Windows and Linux VPS hosting.
Third, Amazon announced plans to offer a management console, monitoring, load balancing, and automatic scaling. That’s going to put it in direct competition with vendors who offer EC2 overlays, like Rightscale. That is not going to come as a surprise to those vendors, most of whom intend to be cloud-agnostic, with their value-add being providing a single consistent interface across multiple clouds. So in some ways, Amazon’s new services, which will also be directly API supported, will actually make life easier for those vendors — it just raises the bar for what value-added features they need.
The management console is a welcome addition, as anyone who has ever attempted to provision through the API and its wrapper scripts will undoubtedly attest. It’s always been an unnecessary level of pain, and the management console doesn’t need to do much of anything to be an improvement over that. People have been managing their own EC2 monitoring just fine, but having Amazon’s view, integrated into the management console, will be a nice plus. (But monitoring itself is an enabling technology for other services; see below.)
There’s never really been a great way to load-balance on EC2. DNS round-robin is crude, and running a load-balancing proxy creates a single point of failure. Native, smart load-balancing would be a boon; here’s a place where Amazon could deliver some great value-adds that are worth paying extra for.
Automatic scaling has been one of the key missing pieces of EC2. Efforts like Scalr have been an attempt to address it, and it’s going to be interesting to see how sophisticated the Amazon native offering will be.
Note that three of these new EC2 elements go together. Implicit in both load-balancing and automatic scaling is the need to be able to monitor instances. The more complete the instrumentation, the smarter the load-balancing and scaling decisions can be.
For a glimpse at the way Amazon is thinking about the interlinkages, check out Amazon CTO’s blog post on Amazon’s efficiency principles.
Rackspace’s deal with Limelight
Rackspace announced yesterday, as part of a general unveiling of its cloud strategy, a new partnership with Limelight Networks.
Under the new partnership, customers of Rackspace’s Cloud Files (formerly CloudFS) service — essentially, a competitor to Amazon S3 — will be able to choose to publish and deliver their files via Limelight’s CDN. Essentially, this will place Rackspace/Limelight in direct competition with Amazon’s forthcoming S3 CDN.
CDN delivery won’t cost Cloud Files customers any more than Rackspace’s normal bandwidth costs for Cloud Files. Currently, that’s $0.22/GB for the first 5 TB, scaling down to $0.15/GB for volumes above 50 TB. Amazon S3, by comparison, is $0.17/GB for the first 10 TB, down to $0.10/GB for volumes over 150 TB; we don’t yet know what its CDN upcharge, if any, will be. As another reference point, Internap resold via SoftLayer is $0.20/GB, so we can probably take that as a reasonable benchmark for the base entry cost of CDN services sold without any commit.
It’s a reasonably safe bet that Limelight’s CDN is going to deliver better performance than Amazon’s S3 CDN, given its broader footprint and peering relationships, so the usual question of, “What’s the business value of performance?” will apply.
It’s a smart move on Rackspace’s part, and an easy way into a CDN upsell strategy for its regular base of hosting customers, too. And it’s a good way for Limelight to pre-emptively compete against the Amazon S3 CDN.
Domain names and Kentucky gambling
Last month, the state of Kentucky issued a seizure order for 141 domain names that it claimed were being used in connection with illegal gambling. (Full text of the order here.)
It’s a remarkable order. It asserts that probable cause exists to believe that the domain names are being used in connection with illegal gambling (despite the fact that some are parked domains, which would clearly indicate otherwise), and that as such, Kentucky is entitled to require the registrars to immediately transfer the registration for those domains to Kentucky or some other entity that it designates.
WebProNews has published statements from Governor Steve Beshear and his deputy communications director Jill Midkiff. The governor essentially claimed that illegal online gambling harms Kentucky’s legal gambling businesses, particularly the lottery and horse racing. But regardless of why it was done, it’s still a chilling potential precedent.
Yesterday, the judge in the case denied a dismissal, setting a forfeiture hearing for next month. He also stated that the sites would have 30 days to voluntarily block access by Kentucky users to avoid further legal problems. MarkMonitor (a provider of managed domain name and brand protection solutions) has posted the full text of the opinion, along with the key relevant questions raised by this case.
This case gets right to the heart of the question, “Who controls the Internet?” If Kentucky succeeds, it will fundamentally change our understanding of jurisdiction with regarding to domain names, with broad ramifications both within the United States and internationally.
Rackspace buys itself some cloud
Rackspace’s cloud event resulted in a very significant announcement: the acquisition of Slicehost and Jungle Disk. There’s also an announced Limelight partnership (unknown at the moment what this means, as the two companies already have a relationship), and a Sonian partnership to offer email archiving to Rackspace’s Mailtrust hosted email business.
My gut reaction: Very interesting moves. Signals an intent to be much more aggressive in the cloud space than I think most people were expecting.
Akamai expands its advertising solutions
Akamai made an advertising-related announcement today, introducing something it calls Advertising Decision Solutions, and stating that it has agreed to aquire acerno for $95 million in cash.
acerno (which seems to belong to the e.e. cummings school of brand naming) is a small retailer-focused advertising network, but the reason that Akamai acquired it is that they operate a data cooperative, wherein retailers share shopping data. This data in turn is used to create a predictive model — i.e., if a customer bought X, then it’s likely they will also be shopping for Y and Z and therefore you might want to show them related ads.
Although Akamai states they’ll continue to operate the acerno business, don’t expect them to really push that ad network; Akamai knows where its bread is buttered and isn’t going to risk competing with the established large ad networks, which number amongst Akamai’s most significant customers. Instead, Akamai intends to use the acerno data and its data cooperative model to enhance the advertising-related capabilities that it offers to its customers.
This complements the Advertising Decision Solutions announcement. Basically, it appears that Akamai is going to begin to exploit its treasure-trove of user behavior data, as well as take advantage of the fact that they deliver content on behalf of the publishers as well as the ad networks, and therefore are able to insert elements into the delivery, such as cookies (thus enabling communication between cooperating Akamai customers without those customers having to manually set up such cooperation with their various partners).
This expansion of Akamai’s product portfolio is a smart move. With the cost of delivery dropping through the floor, Akamai needs new, high-value, high-margin services to offer to customers, as well as services that tie customers more closely to Akamai, creating a stickiness that will make customers more reluctant to switch providers to obtain lower costs. Note, however, that Akamai already dominates the online retail space; the new service probably won’t make much of a difference in a retail customer’s decision about whether or not to purchase Akamai services. It will, however, help them defend and grow their ad network customers, and help them maintain a hold on core website delivery for the media and entertainment space. (This is true even in the face of video delivery moving to low-cost CDNs, since you don’t need to deliver the site and the video from the same CDN.)
I think this move signals that we’re going to see Akamai move into adjacent markets where it can leverage its distributed computing platform, its aggregated data (whether about users, content, systems, or networks), or its customer ecosystem. Because these kinds of services will tend to be decoupled from the actual price of bit delivery, they should also help Akamai broaden its revenue streams.
CDN overlays (and more on MediaMelon)
I was recently briefed by MediaMelon, a just-launched CDN offering a “video overlay network”. The implications of their technology are worth considering, even though I think the company itself is going to have a difficult road to travel. (MediaMelon has two customers thus far, and is angel-funded; it is entering an extremely tough, competitive market. I wish them luck, since their model essentially forces them to compete in the ever-more-cutthroat CDN price war, as their entire value proposition is tied up in lowering delivery costs.)
In brief, when a content provider publishes its video to MediaMelon, MediaMelon divides the video into small chunks, each of which is a separate file that can be delivered via HTTP, and relies upon the video player to re-assemble those chunks. This chunk-based delivery is conceptually identical to Move Networks streamlets. MediaMelon then publishes the content out to its CDN partners (currently Velocix plus an unannounced second partner). MediaMelon’s special sauce is that these chunks are then delivered via multiple sources. This is normally MediaMelon’s P2P network, with a fallback to MediaMelon’s CDN partners. Since the video is in chunks, the source can switch from chunk to chunk. The video player also reports its performance to MediaMelon’s servers, allowing MediaMelon to draw conclusions about how to serve content. As a delivery-focused company, MediaMelon has decided to leave the value-adds to its media platform partners, currently thePlatform.
Whatever the challenges of their business model, though, the overlay model is interesting, and from a broader market perspective, MediaMelon’s technology highlights several things about video player capabilities that should be kept in mind:
- You can carve up your video and let the player re-assemble it.
- You can deliver using multiple sources, including P2P.
- The player knows what kind of performance it’s getting, and can report it.
These three key things make it extremely clear that it is technically feasible to create a “neutral” CDN overlay network, without requiring the cooperation of the CDNs themselves. MediaMelon is halfway there. It just hasn’t put together all the pieces (the technical hurdles are actually nontrivial), and it is designed to work with partner CDNs rather than force them into competition.
Basically, what a (non-AnyCast) CDN like Akamai or Limelight does, is that they’ve got a central engine gathering network performance data, which it uses to choose an individual CDN server, based on what it believes best for you (where “you” is defined by where your nameserver is). That individual CDN server then delivers the content to you.
What an overlay would have is a central engine that gathers performance data directly from the video player, and has a list of sources for a given piece of content (where that list includes multiple CDNs and maybe a P2P network). Based on historical and currently-reported performance data, it would direct the player to the source that delivers acceptable performance for the least cost. Dividing the content into chunks makes this easier, but isn’t strictly necessary. What you’d effectively have is a CDN-of-CDNs, with the overlay needing to own no infrastructure other than the routing processor.
That is the next-generation CDN. If it were vendor-neutral, allowing the customer to choose whomever it wanted to work with, it would usher in an era of truly brutal price competition.