Although this has been long-rumored, and then was formally mentioned in VMware’s recent investor day, VMware has only just formally announced the vCloud Hybrid Service (vCHS), which is VMware’s foray into the public cloud IaaS market.
VMware has previously had a strategy of being an arms dealer to service providers who wanted to offer cloud IaaS. In addition to the substantial ecosystem of providers who use VMware virtualization as part of various types of IT outsourcing offerings, VMware also signed up a lot of vCloud Powered partners, each of which offered what was essentially vCloud Director (vCD) as a service. It also certified a number of the larger providers as vCloud Datacenter Service Providers; each such provider needed to meet criteria for reliability, security, interoperability, and so forth. In theory, this was a sound channel strategy. In practice, it didn’t work.
Today, not long after its recent acquisition of Enstratius,
If you’re a service provider interested in participating in the Cloud IaaS Magic Quadrant process (see the call for vendors), I’d like to recommend a number of my previous blog posts.
Foundational Gartner research notes on cloud IaaS. Recommended reading to understand our thinking on the market.
Having cloud-enabled technology != Having a cloud. Critical for understanding what we do and don’t consider cloud IaaS to be.
AR contacts for a Magic Quadrant should read everything. An explanation of why it’s critical to read every word of every communication received during the MQ process.
The process of a Magic Quadrant. Understanding a little bit about how MQs get put together.
Vendors, Magic Quadrants, and client status. Appropriate use of communications channels during the MQ process.
The art of the customer reference. Tips on how to choose reference customers.
It’s that time of the year again, a little bit early — we’re trying to refresh the Cloud IaaS Magic Quadrant on a nine-month cycle rather than a yearly cycle, reflecting the faster pace of the market.
A pre-qualification survey, intended to gather quantitative metrics and information about each provider’s service, will be going out very soon.
If you are a cloud IaaS provider, and you did not receive the 2012 survey, and you would like to receive the 2013 survey, please email Michele dot Severance at Gartner dot com to request to be added to the contact list. You must be authorized to speak for your company. Please note we cannot work with PR firms for the Magic Quadrant; if you are a PR agency and you think that your client should be participating, you should get in touch with your client and have your client contact Michele.
If you did receive the 2012 survey, you should be receiving email from Michele within the next few days, requesting that you confirm that you’re the right contact or passing it on to the correct contact to do so.
If you’re unsure whether you’re a cloud IaaS provider by this MQ’s definitions, consider the following:
- Are you selling a service? (That means you’re not selling hardware or software.)
- Are you offering compute, storage, and network resources? (You can’t be, say, just a cloud storage provider.)
- Is your offering fully standardized? (It’s identical for every customer, not a reference architecture that you customize.)
- Can customers self-service? (Once approved as customers, they can go to your portal and push buttons to immediately, with zero human intervention, obtain/destroy/configure/manage their infrastructure resources. Managed services can be optional.)
- Can you meter by the hour? (You can either sell by the hour, or you can offer monthly capacity where usage is metered hourly. Having to take a VM for a full month is hosting, not IaaS.)
- Do you have at least one multi-tenant cloud IaaS offering? (Customers must share a capacity pool for the offering to be considered multi-tenant.)
- Do you consider your competition to be offerings such as
Every time there’s been a major Amazon outage, someone always says something like, “Regular Web hosters and colocation companies don’t have outages!” I saw an article in my Twitter stream today, and finally decided that the topic deserves a blog post. (The article seemed rather linkbait-ish, so I’m not going to link it.)
It is an absolute myth that you will not have downtime in colocation or Web hosting. It is also a complete myth that you won’t have downtime in cloud IaaS run by traditional Web hosting or data center outsourcing providers.
The typical managed hosting customer experiences roughly one outage a year. This figure comes from thirteen years of asking Gartner clients, day in and day out, about their operational track record. These outages are typically related to hardware failure, although sometimes they are related to service provider network outages (often caused by device misconfiguration, which can obliterate any equipment or circuit redundancy). Some customers are lucky enough to never experience any outages over the course of a given contract (usually two to three years for complex managed hosting), but this is actually fairly rare, because most customers aren’t architected to be resilient to all but the most trivial of infrastructure failures. (Woe betide the customer who has a serious hardware failure on a database server.) The “one outage a year” figure does not include any outages that the customer might have caused himself through application failure.
The typical colocation facility in the US is built to Tier III standards, with a mathematical expected availability of about 99.98%. In Europe, colocation facilities are often built to Tier II standards intead, for an expected availability of about 99.75%. Many colocation facilities do indeed manage to go for many years without an outage. So do many enterprise data centers — including Tier I facilities that have no redundancy whatsoever. The mathematics of the situation don’t say that you will have an outage — these are merely probabilities over the long term. Moreover, there will be an additional percentage of error that is caused by humans. Single-data-center kings who proudly proclaim that their one data center has never had an outage have gotten lucky.
The amount of publicity that a data center outage gets is directly related to its tenant constituency. The outage at the 365 Main colocation facility in San Francisco a few years back was widely publicized, for instance, because that facility happened to house a lot of Internet properties, including ones directly associated with online publications. There have been significant outages at many other colocation faciliities over the years, though, that were never noted in the press — I’ve found out about them because they were mentioned by end-user clients, or because the vendor disclosed them.
Amazon outages — and indeed, more broadly, outages at large-scale providers like Google — get plenty of press because of their mass effects, and the fact that they tend to impact large Internet properties, making the press aware that there’s a problem.
Small cloud providers often have brief outages — and long maintenance windows, and sometimes lengthy maintenance downtimes. You’re rolling the dice wherever you go. Don’t assume that just because you haven’t read about an outage in the press, it hasn’t occurred. Whether you decide on managed hosting, dedicated hosting, colocation, or cloud IaaS, you want to know a provider’s track record — their actual availability over a multi-year period, not excluding maintenance windows. Especially for global businesses with 24×7 uptime requirements, it’s not okay to be down at 5 am Eastern, which is prime-time in both Europe and Asia.
Sure, there are plenty of reasons to worry about availability in the cloud, especially the possibility of lengthy outages made worse by the fundamental complexity that underlies many of these infrastructures. But you shouldn’t buy into the myth that your local Web hoster or colocation provider necessarily has better odds of availability, especially if you have a non-redundant architecture.
I corresponded with some members of the HP cloud team in email, and then colleagues and I spoke with HP on the phone, after my last blog post called, “Cloud IaaS SLAs can be Meaningless“. HP provided some useful clarifications, which I’ll detail below, but I haven’t changed my fundamental opinion, although arguably the nuances make the HP SLA slightly better than the AWS SLA.
The most significant difference between the SLAs is that the HP’s SLA is intended to cover a single-instance failure, where you can’t replace that single instance; AWS requires that all of your instances in at least two AZs be unavailable. HP requires that you try to re-launch that instance in a different AZ, but a failure of that launch attempt in any of the other AZs in the region will be considered downtime. You do not need to be running in two AZs all the time in order to get the SLA; for the purposes of the SLA clause requiring two AZs, the launch attempt into a second AZ counts.
HP begins counting downtime when, post-instance-failure, you make the launch API call that is destined to fail — downtime begins to accrue 6 minutes after you make that unsuccessful API call. (To be clear, the clock starts when you issue the API call, not when the call has actually failed, from what I understand.) When the downtime clock stops is unclear, though — it stops when the customer has managed to successfully re-launch a replacement instance, but there’s no clarity regarding the customer’s responsibility for retry intervals.
(In discussion with HP, I raised the issue of this potentially resulting in customers hammering the control plane with requests in mass outages, along with intervals when the control plane might have degraded response and some calls succeed while others fail, etc. — i.e., the unclear determination of when downtime ends, and whether customers trying to fulfill SLA responsibilities contribute to making an outage worse. HP was unable to provide a clear answer to this, other than to discuss future plans for greater monitoring transparency, and automation.)
I’ve read an awful lot of SLAs over the years — cloud IaaS SLAs, as well as SLAs for a whole bunch of other types of services, cloud and non-cloud. The best SLAs are plain-language comprehensible. The best don’t even need examples for illustration, although it can be useful to illustrate anything more complicated. Both HP and AWS sin in this regard, and frankly, many providers who have good SLAs still force you through a tangle of verbiage to figure out what they intend. Moreover, most customers are fundamentally interested in solution SLAs — “is my stuff working”, regardless of what elements have failed. Even in the world of cloud-native architecture, this matters — one just has to look at the impact of EBS and ELB issues in previous AWS outages to see why.
Gartner will soon be starting the process of updating our Magic Quadrant for Managed Hosting, currently targeted for publication in Q1 of 2013. This is the update to the Magic Quadrant for Managed Hosting that was published in March 2012 of this year; a free reprint is available. If you consider yourself to be an enterprise-class managed hosting provider, capable of providing fully-managed services for complex, mission-critical websites, this is your Magic Quadrant. (Note that this is a distinct market from data center outsourcing.)
The previous Magic Quadrant was global. However, because regional requirements differ, and many excellent managed hosting providers are not global, we have decided to replace the global Magic Quadrant with three regional Magic Quadrants — one each for North America, Pan-Europe, and Asia-Pacific, published in that order. Each MQ will have its own inclusion criteria and evaluation criteria.
I will be leading the overall global effort, and Gartner’s analysts that cover Managed Hosting will be doing these MQs as a global team, although each regional MQ will have a region-specific lead author. We are going to do a single global data collection effort and set of briefings, though, to reduce the level of effort needed by the service provider AR teams.
Doug Toombs will be the lead author for the North American MQ and will be assisting me in running the global effort. If you are not already following his blog or his Twitter (@DougToombs), I strongly encourage you to do so.
We will imminently be kicking off the process for this set of MQs. If you were not on last year’s Magic Quadrant for Managed Hosting, and you would like to receive a pre-qualification survey, please contact Doug Toombs at Douglas dot Toombs at Gartner dot com. Please note that we allow any service provider to participate in the survey process; reception of a survey does not indicate in any way that we feel that your company is qualified to be in the MQ.
In infrastructure services, the purpose of an SLA (or, for that matter, the liability clause in the contract) is not “give the customer back money to compensate for the customer’s losses that resulted from this downtime”. Rather, the monetary guarantees involved are an expression of shared risk. They represent a vote of confidence — how sure is the provider of its ability to deliver to the SLA, and how much money is the provider willing to bet on that? At scale, there are plenty of good, logical reasons to fear the financial impact of mass outages — the nature of many cloud IaaS architectures create a possibility of mass failure that only rarely occurs in other services like managed hosting or data center outsourcing. IaaS, like traditional infrastructure services, is vulnerable to catastrophes in a data center, but it is additionally vulnerable to logical and control-plane errors.
Unfortunately, cloud IaaS SLAs can readily be structured to make it unlikely that you’ll ever see a penny of money back — greatly reducing the provider’s financial risks in the event of an outage.
Amazon Web Services (AWS) is the poster-child for cloud IaaS, but the
Many people confuse “using a hardware and software stack that potentially enables a cloud” with “cloud infrastructure as a service”. Analyst firms haven’t necessarily done a good job with drawing the distinction, either — there are plenty of (hopefully non-Gartner) analysts who use “IaaS” interchangeably to describe the technology stack and the service itself.
The technology stack — typically an integrated system (such as a Vblock) plus a cloud management platform (such as vCloud Director), although it could be anything, like whitebox servers + Nexenta storage + Arista switches + OpenStack — is what Gartner is now dubbing “cloud-enabled system infrastructure” (CESI). This is admittedly naming-by-committee, but it parallels our use of “cloud-enabled application platform” which is the technology-stack corollary for PaaS.
Why is it important to make a distinction between CESI and IaaS? Because while CESI can be used to deliver IaaS, there are many services that can be delivered on top of a CESI that are not IaaS. Gartner’s cloud definitions are pretty strict, and one of the cores of our IaaS definition is that IaaS is self-service — you can optionally layer managed services on top, but the customer has to have full control to obtain and remove resources in a fully self-service, fully-automated way. Not only are many of the services that can be delivered on CESI not IaaS, they’re not actually cloud services — cloud requires not only self-service, but also scalability, elasticity, and metering by use, for instance.
Why does this distinction matter? Because there are a significant number of service providers in the market today who offer a service on top of a CESI and call it “cloud IaaS”, when it’s neither cloud nor IaaS, and it misleads customers into thinking that they are getting the technical and/or business benefits of going to the cloud and going to IaaS.
I’ve written two research notes that I’m hoping will cut through some of the market confusion. (Links are Gartner clients only, sorry.)
The first, Technology Overview for Cloud-Enabled System Infrastructure, is an overview of how we define CESI and how it is not merely a virtualization environment — it encompasses compute, storage, and network capabilities; it is automated, scalable, elastic, near-real-time on-demand; and it exposes self-service interfaces. It discusses the range of cloud-enabled infrastructure services from those that are merely cloud-facilitated (using the CESI as an efficient infrastructure platform, without exposing it to the customer via self-service), to those that are fully cloud-native (CESI fully exposed to customer via self-service that is routinely used), and gives examples of the services along this spectrum; for instance, IBM’s SCE+ is cloud-enabled data center outsourcing, not IaaS, in our parlance. Finally, it disusses the distinction between global-class CESI architectures (think Amazon Web Services) and enterprise-class architectures (think Vblock + vCD), and how to build a CESI.
The second, Don’t Be Fooled By Offerings Falsely Masquerading as Cloud Infrastructure as a Service, is a note that is intended to help IT buyers figure out what they’re really buying when they are assessing something that the vendor is claiming is IaaS. It focuses on a handful of principles: Beware of common “cloudwashing” claims; ensure that what you’re buying delivers promised technical and business benefits; identify your requirements and look at competing providers before purchasing a solution that requires a contractual commitment; favor providers who have better cloud infrastructure roadmaps.
The four most common “cloudwashing” claims for infrastructure services:
- “It uses virtualization, so it’s a cloud.”
- “It’s being delivered on a Vblock, so it’s a cloud.”
- “We use vCloud Director, so it’s a cloud.”
- “You can buy it through a portal, so it’s a cloud.”
I apologize for introducing more jargon into an already jargon-heavy corner of the market, but I think these distinctions are critical: You can have a technology stack that potentially enables you to deliver cloud IaaS… and use it to deliver something that is neither cloud nor IaaS. Cloud infrastructure as a technology platform is separate and distinct from cloud infrastructure as a service; the technical construct and the service construct are two different things.
I’ve been reading the social media reactions to my recent note on OpenStack, “Don’t Let OpenStack Hype Distort Your Selection of a Cloud Management Platform in 2012” (that’s a client link; a free public reprint without the executive summary is also available), and wanted to respond to some comments that are more centered on the research process and publication process than on the report itself. So, here are a number of general assertions:
Gartner doesn’t do commissioned research. Ever. Repeat: Gartner, unlike almost every other analyst firm, doesn’t do commissioned resesarch — ever. Most analyst firms will do “joint research” or “commissioned whitepapers” or the like — research where a vendor is paying for the note to be written. About a decade ago, Gartner stopped this practice, because management felt it could be seen as compromising neutrality. No vendor paid for that OpenStack note to be written, directly or indirectly. Considering that most of the world’s largest IT vendors are significant participants in OpenStack, and plenty of small ones are as well, and a bunch of them are undoubtedly unhappy about the publication of that note, if Gartner’s interests were oriented around vendors, we certainly wouldn’t have published research practically guaranteed to upset a lot of vendors.
Gartner earns the overwhelming majority of its revenue from IT buyers. About 80% of Gartner’s revenues come from our IT buyer clients (we call them “end-users”). We don’t shill for vendors, ever, because our bread-and-butter comes from IT buyers, who trust us for neutral advice. Analysts are interested in helping our end-user clients make the technology decisions that are best for their business. We also want to help vendors (including those who are involved with free or commercial open-source) succeed in better serving end-users — which often means that we will be critical of vendor efforts. Our clients are asking about OpenStack, and every example of hype in that note comes directly from client interactions. I wrote that note because the volume of OpenStack queries from clients was ramping up, and we needed written research to address it.
Gartner analysts are not compensated on commissions of any sort. Many other analyst firms have incentives for analysts that are tied to revenue or publicity — be quoted in the press X times, sell reports, sell consulting, sell strategy days with vendors, etc. Gartner doesn’t do any of that, and hasn’t for about a decade. Our job, as Gartner analysts, is to try to offer the best advice we can to our clients. Sometimes, of course, we will be wrong, but we try hard. It’s not an academic exercise; our end-user clients have business outcomes that ride on their technology decisions.
Gartner doesn’t dislike open source. As a collective entity, Gartner tends to be cautious in its stances, as our end-user clients tend to be mid-market and enterprise IT executives who are fairly risk-averse; our analysis of all solutions, including OSS, tends to be from that perspective. But we write extensively about open source; we have analysts devoted to the topic, plus everyone covers the OSS relevant to their own coverage. We consider OSS a business strategy like anything else. In fact, we’ve been particularly vocal about how we feel that cloud is driving OSS adoption across a broad spectrum of solutions, and advocates that an IT organization’s adoption of cloud is a great time to consider replacing proprietary tech with OSS. (You’ll note that a whole section of the report predicts OpenStack’s eventual success, by the way, so it’s not like this a prediction of doom, just an observation of present stumbling-blocks on the road to maturity.)
Gartner research notes are Gartner opinion, not an individual analyst’s opinion. Everything that Gartner publishes as a research note (excluding things like blog posts, which we consider off-the-clock, personal time and not a corporate activity) undergoes a peer review process. While notes do slip through the cracks (i.e., get published without sufficiently broad or deep review), our internal processes require analysts to get review from everyone who touches a coverage area. My OpenStack note was very broadly peer reviewed — by other analysts who cover cloud infrastructure, cloud management platforms, and open source software, as well as a bunch of related areas that OpenStack touches. (As a result of that review, the original note almost quadrupled in size, split into one note on OSS CMPs in general, and one note on OpenStack itself.) I also asked for our Ombudsman’s office, which deals with vendor complaints, to review the note to make sure that it seemed fair, balanced, and free of inflammatory language, and they (and my manager) also asked questions about potentially controversial sections, in order to ensure they were backed by facts. Among other things, these processes are intended to ensure that individual analyst bias is eliminated to as large an extent as possible. That process is part of why Gartner’s opinions often sound highly conservative, but when we take a stance, it is usually neither casual nor one analyst’s opinion.
The publication of this note was not a shock to the vendors involved. Most of the vendors were aware that this note was coming; it was a work in progress over the summer. Rackspace, as the owner of OpenStack at the time that this was placed in the publication pipeline, was entitled to a formal review and discussion prior to its publication (as we do for any research that covers a vendor’s product in a substantive way). I had spoken to many of the other vendors in advance of its publication, letting them know it was coming (although since it was pre-Foundation they did not have advance review). The evolving OpenStack opinions of myself and other Gartner analysts have long been known to the vendors.
It would have been easier not to write anything. I have been closely following OpenStack since its inception, and I have worked with many of the OpenStack vendors since the early days of the project. I have a genuine interest in seeing them, and OpenStack, succeed, and I hope that the people that I and other analysts have dealt with know that. Many individuals have confided in me, and other Gartner analysts, about the difficulties they were having with the OpenStack effort. We value these relationships, and the trust they represent, and we want to see these people and their companies succeed. I was acutely careful to not betray any individual confidences when writing that report, ensuring that any concerns surfaced by the vendors had been said by multiple people and organizations, so that there would be no tracebacks. I am aware, however, that I aired everyone’s collective dirty laundry in public. I hope that making the conversation public will help the community rally around some collective goals that will make OpenStack mainstream adoption possible. (I think Rackspace’s open letter implicitly acknowledges the issues that I raised, and I highly encourage paying attention to its principles.)
You will see other Gartner analysts be more active in OpenStack coverage. I originally picked up OpenStack coverage because I have covered Rackspace for the last decade, and in its early days it was mostly a CMP for service providers. Enterprise adoption has begun, and so its primary home for coverage is going to be our CMP analysts (folks like Alessandro Perilli, Donna Scott, and Ronni Colville), although those of us who cover cloud IaaS (especially myself, Kyle Hilgendorf, and Doug Toombs) will continue to provide coverage from a service provider perspective. Indeed, our coverage of OSS CMPs (CloudStack, Eucalyptus, OpenNebula, etc.) has been ramping up substantially of late. We’re early in the market, and you can expect to see us track the maturation of these solutions.