What’s cloud IaaS really about?
As expected, the Magic Quadrant for Cloud IaaS and Web Hosting is stirring up much of the same debate that was raised with the publication of the 2009 MQ.
Derrick Harris over at GigaOM thinks we got it wrong. He writes: Cloud IaaS is about letting users get what they need, when they need it and, ideally, with a credit card. It doesn’t require requisitioning servers from the IT department, signing a contract for any predefined time period or paying for services beyond the computing resources.
Fundamentally, I dispute Derrick’s assertion of what cloud IaaS is about. I think the things he cites above are cool, and represent a critical shake-up in thinking about IT access, but it’s not ultimately what the whole cloud IaaS market is about. And our research note is targeted at Gartner’s clients — generally IT management and architects at mid-sized businesses and enterprises, along with technology start-ups of all sizes (but generally ones that are large enough to have either funding or revenue).
Infrastructure without a contract? Convenient initially, but as the relationship gets more significant, usually not preferable. In fact, most businesses like to be able to negotiate contract terms. (For that matter, Amazon does customzed Enterprise Agreements with its larger customers.) Businesses love not having to commit to capacity, but the whole market is shifting its business models pretty quickly to adapt to that desire.
Infrastructure without involving traditional IT operations? Great, but someone’s still got to manage the infrastructure — shoving it in the cloud does not remove the need for operations, maintenance, patch management, security, governance, budgeting, etc. Gartner’s clients generally don’t want random application developers plunking down a credit card and just buying stuff willy-nilly. Empower developers with self-provisioning, sure — but provisioning raw infrastructure is the easy and cheap part, in the grand scheme of things.
Paying for services beyond the computing resources? Sure, some people love to self-manage their infrastructure. But really, what most people want to do is to only worry about their application. Their real dream is that cloud IaaS provides not just compute capacity, but secure compute capacity — which generally requires handling routine chores like patch management, and dealing with anti-virus and security event monitoring and such. In other words, they want to eliminate their junior sysadmins. They’re not looking for managed hosting per se; they’re looking to get magic, hassle-free compute resources.
I obviously recognize Amazon’s contributions to the market. The MQ entry on Amazon begins with: Amazon is a thought leader; it is extraordinarily innovative, exceptionally agile and very responsive to the market. It has the richest cloud IaaS product portfolio, and is constantly expanding its service offerings and reducing its prices. But I think Amazon represents an aspect of a broad market.
Cloud IaaS is complicated by the diversity of use cases for it. Our clients are also looking for specific guidance on just the “pure cloud”, self-provisioned “virtual data center” services, so we’re doing two more upcoming vendor ratings to address that need — a Critical Capabilities note that is focused solely on feature sets, and a mid-year Magic Quadrant that will be purely focused on this.
I could talk at length about what our clients are really looking for and what they’re thinking with respect to cloud IaaS, which is a pretty complicated and interesting tangle, but I figure I really ought to write a research note for that… and get back to my holiday vacation for now.
Qualifying for the next Cloud IaaS Magic Quadrant
Now that the Magic Quadrant for Cloud Infrastructure as a Service and Web Hosting, 2010 has been published, we’re going to be getting started on the mid-year update almost immediately (in February). The mid-year version will be cloud-only, specifically the self-provisioned “virtual data center” segment of the market.
Since I have been deluged with questions about what it takes to be included (and there’s been some interesting fud on Quora), I thought I’d explain in public.
For many years now, Ted Chamberlin and I have done this Magic Quadrant using criteria that are very black-and-white; anyone should be able to look at them like a checklist. Those criteria are pretty simple:
- You are required to have certain services, which we try to define as clearly as possible.
- There’s a minimum revenue requirement.
- There’s a requirement to demonstrate global presence, either through data centers in particular geographies, or a certain amount of revenue derived from outside your home region.
If you meet those criteria, you’re in. If you don’t meet those criteria, no amount of begging will get you in. It has nothing to do with whether or not you are a client. It doesn’t even have anything to do with whether or not our clients ask about you, or whether we think you’re worthy; in inquiry, we routinely recommend some providers who don’t qualify for the MQ but who compete successfully against included vendors.
Because we routinely recommend vendors who aren’t on the MQ, and we’re obviously interested in the market as a whole, we welcome briefings from all vendors who believe that they serve Gartner’s end-user client base (mid-sized businesses to large enterprises, technology companies and tech-heavy businesses of all sizes), regardless of whether they qualify for inclusion. We also track the lower end of the market, though, so we do look at the vendors who serve small businesses; vendors in this segment are similarly welcome to brief us, though in that space we’re generally primarily interested in market-share leaders and anyone doing something that’s clearly differentiated.
Analysts at Gartner choose what briefings they want to take, regardless of whether or not a vendor is a client (our system for briefing requests doesn’t even tell analysts the vendor’s client status). You are welcome to brief us as frequently as you have something interesting to say.
Gartner is hiring!
Gartner is hiring cloud experts! We’re going to be hiring two analysts who are based in Europe. They’ll cover cloud computing and networking — specifically, they’ll be European counterparts to myself and Ted Chamberlin. That means we’re looking for people who know cloud IaaS, hosting and colocation, and, if possible, have some background in networking as well.
These are Research Director roles, so we’re looking for people who are pretty senior — currently a director or VP, probably, or equivalent, and therefore likely 10+ years of experience (though people with really intensive start-up experience might work out, too, with less).
If you’re interested, drop me a message on LinkedIn.
Cloud adoption and polling
I’m pondering my poll results from the Gartner data center conference, and trying to understand the discontinuities. I spoke at two sessions at the conference. One was higher level and more strategic, called “Is Amazon or VMware the Future of Your Data Center?” The other was very focused and practical, called “Getting Real with Cloud Infrastructure Sevices”. The second session was in the very last slot, and therefore you had to really want to be there, I suppose. The poll sample size of the second session was about half of the first. My polling questions were similar but not identical, and this is the source of the difficulty in understanding the differences in results.
I normally ask a demographic question at the beginning of my session polls, about how much physical server infrastructure the audience members run in their data centers. This lets me cross-tabulate the poll results by demographic, with the expectation that those who run bigger data centers behave differently than those who run smaller data centers. Demographics for both sessions were essentially identical, with about a third of the audience under 250 physical servers, a third between 250 and 1000, and a third with more than 1000. I do not have the cross-tabbed results back yet, unfortunately, but I suspect they won’t explain my problematic results.
In my first session, 33% of the audience’s organizations had used Amazon EC2, and 33% had used a cloud IaaS provider other than Amazon. (The question explicitly excluded internal clouds.) I mentioned the denial syndrome in a previous blog post, and I was careful to note in my reading of the polling questions that I meant any use, not just sanctioned use — the buckets were very specific. The main difference in Amazon vs. non-Amazon was that more of the use of Amazon was informal (14% vs. 9%) and there was less production application usage (8% vs. 12%).
In my second session, 13% of respondents had used Amazon, and 6% had used a non-Amazon cloud IaaS. I am not sure whether I should attribute this vast difference to the fact that I did not emphasize the “any use”, or simply because this session drew a very different sort of attendee, perhaps one who was farther back on the adoption curve and wanting to learn more basic material, than the first session.
The two audiences also skewed extremely differently when asked what mattered to them in choosing a provider (choose top 3 out of list of options). I phrased the questions differently, though. In the first session, it was about “things that matter”; in the second session, it was “the provider who is best-in-class at this thing”. Where this really became a radically different result was in customer service. It was overwhelmingly the most heavily weighted thing in the first session (“excellent customer service, responsive and proactive in meeting my needs”), but was by far the least important thing in the second session (where I emphasized “best in class customer service” and not “good enough customer service”).
Things like this are why I generally do not like to cite conference keypad polls in my research, preferring instead to rely on formal primary research that’s been demographically weighted and where there are enough questions to tease out what’s going on in the respondent’s head. (I do love polls for being able to tailor my talk, on the fly, to the audience, though.)
EdgeCast joins the ADN fray
EdgeCast has announced the beta of its new application delivery network service. For those of you who are CDN-watchers, that means it’s leaping into the fray with Akamai (Dynamic Site Accelerator and Web Application Accelerator, bundles where DSA is B2C and WAA is B2B), Cotendo’s DSA, and CDNetworks’s Dynamic Web Acceleration. EdgeCast’s technology is another TCP optimization approach, of the variety commonly used in application delivery controllers (like F5’s Web Application Accelerator) and WAN optimization controllers (like Riverbed).
EdgeCast seems to be gaining traction with my client base in the last few weeks — they’re appearing in a lot more shortlists. This appears to be the direct result of aggressive SEO-based marketing.
What’s important about this launch: EdgeCast isn’t just a standalone CDN. It is also a CDN partner to many carriers, and it is deriving an increasingly larger percentage of its revenue by selling its software. That means that EdgeCast introducing ADN services potentially has ripple effects on the ecosystem, in terms of spreading ADN technology more widely.
Just enough privacy
For a while now, I’ve been talking to Gartner clients about what concerns keep them off public cloud infrastructure, and the diminishing differences between private and public cloud from service providers. I’ve been testing a thesis with our clients for some time, and I’ve been talking to people here at Gartner’s data center conference about it, as well.
That thesis is this: People will share a data center network, as long as there is reasonable isolation of their traffic, and they are able to get private non-Internet connectivity, and there is a performance SLA. People will share storage, as long as there is reasonable assurance that nobody else can get at their data, which can be handled via encryption of the storage at rest and in flight, perhaps in conjunction with other logical separation mechanisms, and again, there needs to be a performance SLA. But people are worried about hypervisor security, and don’t want to share compute. Therefore, you can meet most requirements for private cloud functionality by offering temporarily dedicated compute resources.
Affinity rules in provisioning can address this very easily. Simply put, the service provider could potentially maintain a general pool of public cloud compute capacity — but set a rule for ‘psuedo-private cloud’ customers that says that if a VM is provisioned on a particular physical server for customer X, then that physical server can only be used to provision more VMs for customer X. (Once those VMs are de-provisioned, the hardware becomes part of the general pool again.) For a typical customer who has a reasonable number of VMs (most non-startups have dozens, usually hundreds, of VMs), the wasted capacity is minimal, especially if live VM migration techniques are used to optimize the utilization of the physical servers — and therefore the additional price uplift for this should be modest.
That gets you public cloud compute scale, while still assuaging customer fears about server security. (Interestingly, Amazon salespeople sometimes tell prospects that you can use Amazon as a private cloud — you just have to use only the largest instances, which eat the resources of the full physical server.)
Observations from the Gartner data center conference
I’m at Gartner’s Data Center Conference this week, and I’m finding it to be an interesting contrast to our recent Application Architecture, Development, and Integration Summit.
AADI’s primary attendees are enterprise architects and other people who hold leadership roles in applications development. The data center conference’s primary attendees are IT operations directors and others with leadership roles in the data center. Both have significant CIO attendance, especially the data center conference. Attendees at the data center conference, especially, skew heavily towards larger enterprises and those who otherwise have big data centers, so when you see polling results from the conference, keep the bias of the attendees in mind. (Those of you who read my blog regularly: I cite survey data — formal field research, demographically weighted, etc. — differently than conference polling data, as the latter is non-scientific.)
At AADI, the embrace of the public cloud was enthusiastic, and if you asked people what they were doing, they would happily tell you about their experiments with Amazon and whatnot. At this conference, the embrace of the public cloud is far more tentative. In fact, my conversations not-infrequently go like this:
Me: Are you doing any public cloud infrastructure now?
Them: No, we’re just thinking we should do a private cloud ourselves.
Me: Nobody in your company is doing anything on Amazon or a similar vendor?
Them: Oh, yeah, we have a thing there, but that’s not really our direction.
That is not “No, we’re not doing anything on the public cloud”. That’s, “Yes, we’re using the public cloud but we’re in denial about it”.
Lots of unease here about Amazon, which is not particularly surprising. That was true at AADI as well, but people were much more measured there — they had specific concerns, and ways they were addressing, or living with, those concerns. Here the concerns are more strident, particularly around security and SLAs.
Feedback from folks using the various VMware-based public cloud providers seems to be consistently positive — people seem to uniformly be happy with the services themselves and are getting the benefits they hoped to get, and are comfortable. Terremark seems to be the most common vendor for this, by a significant margin. Some Savvis, too. And Verizon customers seem to have talked to Verizon about CaaS, at least. (This reflects my normal inquiry trends, as well.)
What does the cloud mean to you?
My Magic Quadrant for Cloud Infrastructure as a Service and Web Hosting is done. The last week has been spent in discussion with service providers over their positioning and the positioning of their competitors and the whys and wherefores and whatnots. That has proven to be remarkably interesting this year, because it’s been full of angry indignation by providers claiming diametrically opposed things about the market.
Gartner gathers its data about what people want in two ways — from primary research surveys, and, often more importantly, from client inquiry, the IT organizations who are actually planning to buy things or better yet are actually buying things. I currently see a very large number of data points — a dozen or more conversations of this sort a day, much of it focused on buying cloud IaaS.
And so when a provider tells me, “Nobody in the market wants to buy X!”, I generally have a good base from which to judge whether or not that’s true, particularly since I’ve got an entire team of colleagues here looking at cloud stuff. It’s never that those customers don’t exist; it’s that the provider’s positioning has essentially guaranteed that they don’t see the deals outside their tunnel vision service.
The top common fallacy, overwhelmingly, is that enterprises don’t want to buy from Amazon. I’ve blogged previously about how wrong this is, but at some point in the future, I’m going to have to devote a post (or even a research note) to why this is one of the single greatest, and most dangerous, delusions, that a cloud provider can have. If you offer cloud IaaS, or heck, you’re a data-center-related business, and you think you don’t compete with Amazon, you are almost certainly wrong. Yes, even if your customers are purely enterprise — especially if your customers are large enterprises.
The fact of the matter is that the people out there are looking at different slices of cloud IaaS, but they are still slices of the same market. This requires enough examination that I’m actually going to write a research note instead of just blogging about it, but in summary, my thinking goes like this (crudely segmented, saving the refined thinking for a research note):
There are customers who want self-managed IaaS. They are confident and comfortable managing their infrastructure on their own. They want someone to provide them with the closest thing they can get to bare metal, good tools to control things (or an API they can use to write their own tools), and then they’ll make decisions about what they’re comfortable trusting to this environment.
There are customers who want lightly-managed IaaS, which I often think of as “give me raw infrastructure, but don’t let me get hacked” — which is to say, OS management (specifically patch management) and managed security. They’re happy managing their own applications, but would like someone to do all the duties they typically entrust to their junior sysadmins.
There are customers who want complex management, who really want soup-to-nuts operations, possibly also including application management.
And then in each of these segments, you can divide customers into those with a single application (which may have multiple components and be highly complex, potentially), and those who have a whole range of stuff that encompass more general data center needs. That drives different customer behaviors and different service requirements.
Claiming that there’s no “real” enterprise market for self-managed is just as delusional as claiming there’s no market for complex management. They’re different use cases in the same market, and customers often start out confused about where they fall along this spectrum, and many customers will eventually need solutions all along this spectrum.
Now, there’s absolutely an argument to be made that the self-managed and lightly-managed segments together represent an especially important segment of the market, where a high degree of innovation is taking place. It means that I’m writing some targeted research — selection notes, a Critical Capabilities rating of individual services, probably a Magic Quadrant that focuses specifically on this next year. But the whole spectrum is part of the cloud IaaS adoption phenomenon, and any individual segment isn’t representative of the total market evolution.
Designing to fail
Cloud-savvy application architects don’t do things the same way that they’re done in the traditional enterprise.
Cloud applications assume failure. That is, well-architected cloud applications assume that just about anything can fail. Servers fail. Storage fails. Networks fail. Other application components fail. Cloud applications are designed to be resilient to failure, and they are designed to be robust at the application level rather than at the infrastructure level.
Enterprises, for the most part, design for infrastructure robustness. They build expensive data centers with redundant components. They buy expensive servers with dual-everything in case a component fails. They buy expensive storage and mirror their disks. And then whatever hardware they buy, they need two of. All so the application never has to deal with the failure of the underlying infrastructure.
The cloud philosophy is generally that you buy dirt-cheap things and expect they’ll fail. Since you’re scaling out anyway, you expect to have a bunch of boxes, so that any box failing is not an issue. You protect against data center failure by being in multiple data centers.
Cloud applications assume variable performance. Well-architected cloud applications don’t assume that anything is going to complete in a certain amount of time. The application has to deal with network latencies that might be random, storage latencies that might be random, and compute latencies that might be random. The principle of the distributed application of this sort is that just about anything that you’re talking to can mysteriously drop off the face of the Earth at any point in time, or at least not get back to you for a whlie.
Here’s where it gets funkier. Even most cloud-savvy architects don’t build applications this way today. This is why people howl about Amazon’s storage back-end for EBS, for instance — they’re used to consistent and reliable storage performance, and EBS isn’t built that way, and most applications are built with the assumption that seemingly local standard I/O is functionally local and therefore is totally reliable and high-performance. This is why people twitch about VM-to-VM latencies, although at least here there’s usually some application robustness (since people are more likely to architect with network issues in mind). This is the kind of problem things like Node.js were created to solve (don’t block on anything, and assume anything can fail), but it’s also a type of thinking that’s brand-new to most application architects.
Performance is actually where the real problems occur when moving applications to the cloud. Most businesses who are moving existing apps can deal with the infrastructure issues — and indeed, many cloud providers (generally the VMware-based ones) use clustering and live migration and so forth to present users with a reliable infrastructure layer. But most existing traditional enterprise apps don’t deal well with variable performance, and that’s a problem that will be much trickier to solve.
Amazon, ISO 27001, and a correction
FlyingPenguin has posted a good critique of my earlier post about Amazon’s ISO 27001 certification.
Here’s a succinct correction:
To quote Wikipedia, ISO 27001 requires that management:
- Systematically examine the organization’s information security risks, taking account of the threats, vulnerabilities and impacts;
- Design and implement a coherent and comprehensive suite of information security controls and/or other forms of risk treatment (such as risk avoidance or risk transfer) to address those risks that are deemed unacceptable; and
- Adopt an overarching management process to ensure that the information security controls continue to meet the organization’s information security needs on an ongoing basis.
ISO 27002, which details the security best practices, is not required to be used in conjunction with 27001, although this is customary. I forgot this when I wrote my post (when I was reading docs written by my colleagues on our security team, which specifically recommend the 27001 approach, in the context of 27002).
In other words: 27002 is proscriptive in its controls; 27001 is not that specific.
So FlyingPenguin is right — without the 27002, we have no idea what security controls Amazon has actually implemented.