Category Archives: Infrastructure
Gartner’s Magic Quadrant for Cloud Infrastructure as a Service, 2014, has just been released (see the client-only interactive version, or the free reprint). If you’re a Gartner client, you can also view the related charts, which summarize the offerings, features, and data center locations in a convenient table format. (The charts are unfortunately less readable than they could be, as our publication system doesn’t allow comments in Excel spreadsheets. Sorry.)
We’re continuing to update this Magic Quadrant every nine months, since the market is moving so quickly. There have been significant changes in vendor positions since the August 2013 Magic Quadrant (the free reprint has expired, but the graphic is floating around, and Gartner clients can use the “History” tab in the online Magic Quadrant tool, which allows you to compare 2012, 2013, and 2014 interactively).
We’ve observed, over the last nine months, a major shift in Gartner’s client base — the desire to make strategic bets on cloud IaaS providers. In general, this reduces the number of significant suppliers to an organization to just one or two (whereas many organizations had as many as four), with the overwhelming bulk of the workloads going to one provider. It also means that clients are interested in knowing not just who is winning right now, but who is going to be the winner in five or even ten years. That’s really the lens that this Magic Quadrant should be viewed through: Who has what it takes to convince the customer that they can serve both current needs and will sustain market leadership over the long term?
Our clients have, since September of 2013 (which seemed to mark a change in Microsoft’s go-to-market approach for Azure), consistently viewed this as an AWS vs. Microsoft battle, with AWS continuing to win the vast majority of business but Microsoft winning significant inroads, especially with later-adopter customers. In recent weeks since the big price drops, lots of clients have been asking about the future of Google, as well, and there are a lot of curiosity questions about IBM (SoftLayer) also, although the IBM questions tend to be more outsourcing and broader-strategy in orientation. Of course, prospects consider other vendors, especially their existing incumbent vendors, as well, but AWS and Microsoft are overwhelmingly the top contenders.
What’s interesting about this year’s Visionaries is that they all have new platforms — CenturyLink with the Tier 3 acquisition, CSC with the ServiceMesh acquisition coupled with the AWS partnership, Google with Google Compute Engine, IBM with the SoftLayer acquisition, and Verizon Terremark with the still-beta Verizon Cloud. (Arguably VMware falls into this bucket as well, despite being a Niche Player this year.) These providers are in the middle of reinventing themselves, most with the idea of battling it out for the #3 spot in the market.
This is not a market for the faint of heart. (I recently asked a large vendor if they intended to compete seriously in the IaaS space, and was told, “Only an idiot takes on Amazon, Microsoft, and Google simultaneously.”) For that matter, this is not a market for the shallow of pocket. You can’t spend your way to success here, but you need engineers, intellectual property, and to be a real #3, substantial capital investment in infrastructure.
There’s also a clear convergence with the PaaS market that’s taking place here. AWS has long offered an array of services that are PaaS elements, as well as many things that sit on the spectrum between pure IaaS and pure PaaS. Microsoft and Google started as PaaS providers and then launched IaaS offerings. The distintions will blur and increasingly become less relevant, as providers fight it out on features and capabilities.
Gartner continues to separate our evaluation of related managed and professional services from the core cloud IaaS platform, because we believe that clients are increasingly choosing a platform, and then choosing consultants and managed services providers (or alternatively, turning to a trusted integrator who helps them choose the right platforms for their needs). I’ll be writing on this more in the future, but keep an eye out for the upcoming regional Magic Quadrants for Cloud-Enabled Managed Hosting for a managed services-oriented view.
When I wrote a research note called Don’t Let OpenStack Hype Distort Your Selection of a Cloud Management Platform in 2012, in September 2012, I took quite a bit of flak in public for my statements about OpenStack’s maturity. At the time, I felt that the industry was about 18 to 24 months from the point where real commercial adoption of OpenStack would begin. It now looks like I made the right call — 20 months have passed since I wrote that note, and indeed, OpenStack seems to be on the cusp of that tipping point. OpenStack is truly becoming a business. Last year’s Portland summit was a developer summit. This year’s summit has the feel of a trade show, although of course it’s still a set of working meetings as well as a user conference.
There’s much work to be done still, but things are grinding onwards in an encouraging fashion. The will to solve the common problems of installs, upgrades, and networking seems to have permeated the community sufficiently that these basic elements of usability and stability are getting into the core. The involvement of larger vendors has created a collective determination to do what it takes to make enterprise adoption of OpenStack possible, in due time.
In March of this year, I wrote a new document called An Overview of OpenStack, 2014. It contains the updated Gartner positions on OpenStack — along with practical information for users, like use cases, vendors, and how to select a distro. (No vendor has done a free reprint of the note, so it’s behind the paywall, sorry.) I have no updates to that position after this summit; it has been largely what I expected it to be. However, I did want to comment on what I see as one of the key questions now facing the OpenStack Foundation and contributing vendors.
One of the positions taken in my recent note is a re-iteration of a 2012 position — we believe that OpenStack “will eventually mature into a solid open-source core at the heart of multiple commercial products and services.” One of the key questions that seems to be at hand now is how large that core should be — a fundamental controversy for OpenStack Foundation members, each of whom has a position based on where their company adds value.
At one pole of the spectrum are the vendors who want to maximize the capabilities in OpenStack that are fully open-source — I’ll call them the “more open” camp. (End-users, of course, also all want this.) These vendors typically differentiate in some way that is not the software itself. They do consulting, they are managed services providers, they are cloud IaaS providers, or they are selling some kind of product or service that uses OpenStack under the covers but delivers some other kind of value (NFV, SaaS, and so on). They want the maximum capabilities delivered in the software, and they’re willing to contribute their own work towards this end.
At the other pole of the spectrum are the vendors who intend to sell a cloud management platform (CMP) and need to be able to differentiate — I’ll call them the “more proprietary” camp. That means that there’s the question of “how does a distro differentiate”. It has already been previously argued that installation and upgrades should be left to commercial distributions. At long last it seems to be agreed that for the good of the community at least some of these capabilities need to be decent in the core. The next controversial one seems to be an HA control plane. But it also gets into the broader question of how deep the functionality of OpenStack as a whole should go. Vendors that sell OpenStack software really fall into two broad categories — those that intend to supportively wrap what is essentially vanilla OpenStack (like the Linux vendors), and those who are building a full-fledged CMP (or CMP suite) into which OpenStack may essentially disappear near-invisibly (except for maybe an exposed API), surrounded by a rich fudgy layer of proprietary software (like HP and IBM). Most of these vendors want just enough in the OpenStack open-source to make OpenStack overall successful.
There are nuances here, of course, and many vendors fall somewhere between these two poles, but I think that summarizes the two camps pretty well. Each camp has its own beliefs about what is best for their own companies and what is best for OpenStack. These are legitimate debates about what is “just enough” functionality in OpenStack (and how that “just enough” changes over time), even amongst vendors who occupy the “more proprietary” camp — and whether that “just enough” is sufficient to satisfy the “more open” camp. Indeed, the “more open” camp may find that they cannot get their contributions accepted because the “more proprietary” camp is gatekeeping.
It is critical to note that no vendor I’ve ever spoken to thinks that OpenStack interoperability means that you should be able to easily switch between distributions or OpenStack-based service providers. Rather, the desire is to ensure that there’s enough of an interoperability construct that there can be a viable OpenStack ecosystem — it’s about the ability of ecosystem vendors to interoperate with a variety of OpenStack-based vendors, far more than it is about the user’s ability to interoperate between OpenStack-based solutions. To reiterate another point from my previous research notes: Customers should expect to be no less locked into an OpenStack-based vendor/provider than they would into any other CMP or cloud IaaS provider.
As my colleague Daryl Plummer has put it: We’re at the end of the beginning phase of cloud computing.
As 2014 dawns, we’re moving into an era of truly mainstream adoption of cloud IaaS. While many organizations have already been using cloud IaaS for several years, gradually moving from development to production, with an ever-expanding range of use cases and applications, the shift to truly strategic adoption is just getting underway. Increasingly, organizations are asking what can’t go to the cloud, rather than what can.
Organizations that haven’t done at least a cloud IaaS pilot by now, however informal (“informal” includes that one crazy developer who decided to give his credit card to Amazon) are at the trailing edge of adoption. The larger the business, the more likely it is to be doing things in cloud IaaS; this is a trend that starts from enterprises and works its way down. (Technology companies of all sizes, of course, are comfortably ensconced in the cloud.)
Gartner’s clients with multiple years of cloud IaaS under their belts are now comfortably going towards more strategic adoption. What’s interesting, though, is that later adopters are also going towards strategic adoption — they’re skipping the years of early getting-their-feet-wet, and immediately jumping in with more significant projects, with more ambitious goals. That makes a great deal of sense, though — by this point, the market is more mature, and there are immediate and clear answers to practical issues like, “How do I connect my enterprise network?” (That one question, by the way, continues to benefit Amazon, which has a precise answer, versus the often-fuzzy or complex answers of other competitors who have less industrialized processes for doing so.)
I’ve said before that developers are the key to cloud IaaS adoption in most organizations. It’s also becoming clear that the most successful strategic efforts will be developer-led, usually with an enterprise architect as the lead for the organization-wide effort. It is the developers that have the strategic vision for the future of application development and operations, and that care about things like faster delivery (i.e., business agility), continuous integration, continuous deployment, application lifecycle management, and infrastructure as code. IT operations seems to almost inevitably be mired in thinking about solely their own domain, which tends to be focused on a data center view that effectively reduces to “how do we keep the lights on, at a lower cost?” This has a high probability of leading to solutions that might be right for IT operations, but wrong for the business.
At the moment, I’m writing research focused on best practices — the lessons learned from the trenches, from organizations who have adopted cloud IaaS over the last seven years of the market. I’m always interested in hearing your stories.
If you’re a service provider interested in participating in the research process for Gartner’s Magic Quadrant for Cloud IaaS (see the call for vendors), or the regional Magic Quadrants for Cloud-Enabled Managed Hosting (see that call for vendors), you will probably want to read some of my previous blog posts.
The Magic Quadrant Process Itself
AR contacts for a Magic Quadrant should read everything. An explanation of why it’s critical to read every word of every communication received during the MQ process.
The process of a Magic Quadrant. Understanding a little bit about how MQs get put together.
Vendors, Magic Quadrants, and client status. Appropriate use of communications channels during the MQ process.
The art of the customer reference. Tips on how to choose reference customers.
Gartner’s Understanding of the Market
Foundational Gartner research notes on cloud IaaS and managed hosting, 2014. Recommended reading to understand our thinking on the markets.
Having cloud-enabled technology != Having a cloud. Critical for understanding what we do and don’t consider cloud IaaS to be.
Infrastructure resilience, fast VM restart, and Google Compute Engine. An explanation of why infrastructure resilience still matters in the cloud, and what we mean by the term.
No World of Two Clouds. Why we do not believe that there will be a separation of the cloud IaaS offerings that target the enterprise, from those that target cloud-native organizations.
Cloud IaaS market share and the developer-centric world. How developers, rather than IT operations admins, drive spend in the cloud IaaS market.
With the refresh of the Magic Quadrant for Cloud IaaS, and the evolution of the regional Magic Quadrants for Managed Hosting into Magic Quadrants for Cloud-Enabled Managed Hosting, I am following my annual tradition of highlighting researching that myself and others have published that’s important in the context of these MQs. These notes lay out how we see the market, and consequently, the lens that we’re going to be evaluating the service providers through.
As always, I want to stress that service providers do not need to agree with our perspective in order to rate well. We admire those who march to their own particular beat, as long as it results in true differentiation and more importantly, customer wins and happy customers — a different perspective can allow a service provider to serve their particular segments of the market more effectively. However, such providers need to be able to clearly articulate that vision and to back it up with data that supports their world-view.
This updates a previous list of foundational research. Please note that those older notes still remain relevant, and you are encouraged to read them. You might also be interested in a previous research round-up (clients only).
If you are a service provider, these are the research notes that it might be helpful to be familiar with (sorry, links are behind client-only paywall):
Magic Quadrant for Cloud IaaS, 2013. Last year’s Magic Quadrant is full of deep-dive information about the market and the providers. Also check out the Critical Capabilities for Public Cloud IaaS, 2013 for a deeper dive into specific public cloud IaaS offerings (Critical Capabilities is almost solely focused on feature set for particular use cases, whereas a Magic Quadrant positions a vendor in a market as a whole).
Magic Quadrant for Managed Hosting, North America and Magic Quadrant for European Managed Hosting. Last year’s managed hosting Magic Quadrants are likely the last MQs we’ll publish for traditional managed hosting. They still make interesting reading even though these MQs are evolving this year.
Pricing and Buyer’s Guide for Web Hosting and Cloud Infrastructure, 2013. Our market definitions are described here.
Evaluation Criteria for Public Cloud IaaS Providers. Our Technical Professionals research provides extremely detailed criteria for large enterprises that are evaluating providers. While the customer requirements are somewhat different in other segments, like the mid-market, these criteria should give you an extremely strong idea of the kinds of things that we think are important to customers. The cloud IaaS MQ evaluation criteria are not identical (because it is broader than just large-enterprise), but they are very similar — we do coordinate our research.
Technology Overview for Cloud-Enabled System Infrastructure. If you’re wondering what cloud-enabled system infrastructure (CESI) is, this will explain it to you. Cloud-enabled managed hosting is the combination of a CESI with managed services, so it’s important to understand.
Don’t Be Fooled By Offerings Falsely Masquerading as Cloud IaaS. This note was written for our end-user clients, to help them sort out an increasingly “cloudwashed” service provider landscape. It’s very important for understanding what constitutes a cloud service and why the technical and business benefits of “cloud” matter.
Service Providers Must Understand the Real Needs of Prospective Customers of Cloud IaaS. Customers are often confused about what they want to buy when they claim to want “cloud”. This provides structured guidance for figuring this out, and it’s important for understanding service provider value propositions.
How Customers Purchase Cloud IaaS, 2012. A lifecycle exploration of how customers adopt and expand their use of cloud IaaS. Important for understanding our perspective on sales and marketing. (It’s dated 2012, but it’s actually a 2013 note, and still fully current.)
Market Trends: Managed Cloud Infrastructure, 2013. Our view of the evolution of data center outsourcing, managed hosting, and cloud IaaS, and broadly, the “managed cloud”. Critical for understanding the future of cloud-enabled managed hosting.
Managed Services Providers Must Adapt to the Needs of DevOps-Oriented Customers. As DevOps increases in popularity, managed services increasingly want their infrastructure to be managed with a DevOps philosophy. This represents a radical change for service providers. This note explores the customer requirements and market implications.
If you are not a Gartner client, please note that many of these topics have been covered in my blog in the past, if at a higher level (and generally in a mode where I am still working out my thinking, as opposed to a polished research position).
We began the call-for-vendors process for Gartner’s 2014 Magic Quadrant for Cloud Infrastructure as a Service, as well as the regional Magic Quadrants for Cloud-Enabled Managed Hosting, in December. (See Doug Toombs’s call for vendors for the latter.)
The pre-qualification survey, which is intended to gather quantitative metrics and information about each provider’s service, is going out imminently. We sent out contact confirmations on December 26th to all service providers who are currently on our list to receive the survey. If you haven’t received a contact confirmation and you want to receive the survey, please contact Michele Severance (Michele dot Severance at Gartner dot com), who is providing administrative support for this Magic Quadrant. You must be authorized to speak for your company. Please note we cannot work with PR firms for the Magic Quadrant; if you are a PR agency and you think that your client should be participating, you should get in touch with your client and have your client contact Michele.
This year, we are doing an integrated survey process for multiple Magic Quadrants. The cloud IaaS MQ is, in many ways, foundational. Cloud-enabled managed hosting is the delivery of managed services on top of cloud IaaS and, more broadly, cloud-enabled system infrastructure. Consequently, this year’s survey asks about your platforms, your managed services levels, and how those things combine into service offerings. Because the survey is longer, we’re starting the survey process earlier than usual.
The survey is an important part of our data collection efforts on the markets, not just for the Magic Quadrants. We use the survey data to recommend providers throughout the year, particularly since we try to find providers that can exactly fit a client’s needs — including small niche providers. Far from everything fits into the one-size-fits-all mold of the largest providers.
The Cloud IaaS MQ continues to be updated on a 9-month cycle, reflecting the continued fast pace of the market. It will have similar scope to last year, with a very strong emphasis on self-service capabilities.
Please note that receiving a survey does not in any way indicate that we believe that your company is likely to qualify; we simply allow surveys to go to all interested parties (assuming that theyÃ¢re not obviously wrong fits, like software companies without an IaaS offering).
The status for this Magic Quadrant will be periodically updated on its status page.
Google Compute Engine (GCE) — Google’s cloud IaaS offering — is now in general availability, an announcement accompanied by a 10% price drop, new persistent disk (which should now pretty much always be used instead of scratch disk), and expanded OS support (though no Microsoft Windows yet). The announcement also highlights two things I wrote about GCE recently, in posts about its infrastructure resilience features — live migration and fast VM restart.
Amazon Web Services (AWS) remains the king of this space and is unlikely to be dethroned anytime soon, although Microsoft Windows Azure is clearly an up-and-coming competitor due to Microsoft’s deep established relationships with business customers. GCE is more likely to target the cloud-natives that are going to AWS right now — companies doing things that the cloud is uniquely well-suited to serve. But I think the barriers to Google moving into mainstream businesses are more of a matter of go-to-market execution, along with trust, track record, and an enterprise-friendly way of doing business — Google’s competitive issues are unlikely to be technology.
In fact, I think that Google is likely to push the market forward in terms of innovation in a way that Azure will not; AWS and Google will hopefully goad each other into one-upsmanship, creating a virtuous cycle of introducing things that customers discover they love, thus creating user demand that pushes the market forward. Google has a tremendous wealth of technological capabilities in-house that it likely can externalize over time. Most organizations can’t do things the way that Google does them, but Google can certainly start making the attempt to make it easier for other organizations to adopt the Google approach to the world, by exposing their tools in an easily-consumable way.
GCE still lags AWS tremendously in terms of breadth and depth of feature set, of course, but it also has aspects that are immediately more attractive for some workloads. However, it’s now at the point where it’s a viable alternative to AWS for organizations who are looking to do cloud-native applications, whether they’re start-ups or long-established companies. I think the GA of GCE is a demarcation of market eras — we’re now moving into a second phase of this market, and things only get more interesting from here onwards.
If you’re an investment banker or a vendor, and you’ve asked me in the last year, “Who should we buy?”, I’ve often pointed at enStratius (bought by Dell), ServiceMesh (bought by CSC last week), and Tier 3.
So now I’m three for three, because CenturyLink just bought Tier 3, continuing its acquisition activity. CenturyLink is a US-based carrier (pushed to prominence when they acquired Qwest in 2011). They got into the hosting business (in a meaningful way) when they bought Savvis in 2011; Savvis has long been a global leader in colocation and managed hosting. (It’s something of a pity that CenturyLink is in the midst of killing the Savvis brand, which has recently gotten a lot of press because of their partnership with VMware for vCHS, and is far better known outside the US than the CenturyLink brand, especially in the cloud and hosting space.)
Savvis has an existing cloud IaaS business and a very large number of offerings that have the “cloud” label, generally under the Symphony brand — I like to say of Savvis that they never seem to have a use case that they don’t think needs another product, rather than having a unified but flexible platform for everything.
The most significant of Savvis’s cloud offerings are Symphony VPDC (recently rebranded to Cloud Data Center), SavvisDirect, and their vCHS franchise. VPDC is a vCloud-Powered public cloud offering (although Savvis has done a more user-friendly portal than vCloud Director provides); Savvis often combines it with managed services in lightweight data center outsourcing deals. (Savvis also has private cloud offerings.) SavvisDirect is an offering developed together with CA, and is intended to be a pay-as-you-go, credit-card-based offering, targeted at small businesses (apparently intended to be competitive with AWS, but whose structure seems to illustrate a failure to grasp the appeal of cloud as opposed to just mass-market VPS).
Savvis is the first franchise partner for vCHS; back at the time of VMworld (September) they were offering indications that over the long term that they thought that vCHS would win and that Savvis only needed to build its own IaaS platform until vCHS could fully meet customer requirements. (But continuing to have their own platform is certainly necessary to hedge their bets.)
Now CenturyLink’s acquisition of Tier 3 seems to indicate that they’re going to more than hedge their bets. Tier 3 is an innovative small IaaS provider (with fingers in the PaaS world through a Cloud Foundry-based PaaS, and they added .NET support to Cloud Foundry as “Iron Foundry”). Their offering is vCloud-Powered public cloud IaaS, but they entirely hide vCloud Director under their own tooling (and it doesn’t seem vCloud-ish from either the front-end or the implementation of the back-end), and they have a pile of interesting additional capabilities built into their platform. They’ve made a hypervisor-neutral push, as well They’ve got a nice blend between capabilities that appeal to the traditional enterprise, and forward-looking capabilities that appeal to a DevOps orientation. Tier 3 has some blue-chip enterprise names as customers, and it has historically scored well on Gartner evaluations, and they’re strongly liked by our enterprise clients who have evaluated them — but people have always worried about their size. (Tier 3 has made it easy to white-label the offering, which has given them more success from its partners, like Peer 1.) The acquisition by CenturyLink neatly solves that size problem.
Indeed, CenturyLink seems to have placed a strong vote of confidence in their IaaS offering, because Tier 3 is being immediately rebranded, and immediately offered as the CenturyLink Cloud. (Current outstanding quotes for Symphony VPDC will likely be requoted, and new VPDC orders are unlikely to be taken.) CenturyLink will offer existing VPDC customers a free migration to the Tier 3 cloud (since it’s vCD-to-vCD, presumably this isn’t difficult, and it represents an upgrade in capabilities for customers). CenturyLink is also immediately discontinuing selling the SavvisDirect offering (although the existing platform will continue to run for the time being); customers will be directed to purchase the Tier 3 cloud instead. (Or, I should say, the CenturyLink Cloud, since the Tier 3 brand is being killed.) CenturyLink is also doing a broad international expansion of data center locations for this cloud.
CenturyLink has been surprisingly forward-thinking to date about the way the cloud converges infrastructure capabilities (including networking) and applications, and how application development and operations changes as a result. (They bought AppFog back in June to get a PaaS offering, too.) Their vision of how these things fit together is, I think, much more interesting than either AT&T or Verizon’s (or for that matter, any other major global carrier). I expect the Tier 3 acquisition to help accelerate their development of capabilities.
Savvis’s managed and professional services combined with the Tier 3 platform should provide them some immediate advantages in the cloud-enabled managed hosting and data center outsourcing markets. It’s more competition for the likes of CSC and IBM in this space, as well as providers like Verizon Terremark and Rackspace. I think the broad scope of the CenturyLink portfolio will mesh nicely not just with existing Tier 3 capabilities, but also capabilities that Tier 3 hasn’t had the resources to be able to develop previously.
Even though I believe that the hyperscale providers are likely to have the dominant market share in cloud IaaS, there’s still a decent market opportunity for everyone else, especially when the service is combined with managed and professional services. But I believe that managed and professional services need to change with the advent of the cloud — they need to become cloud-native and in many cases, DevOps-oriented. (Gartner clients only: see my research note, “Managed Service Providers Must Adapt to the Needs of DevOps-Oriented Customers“.) Tier 3 should be a good push for CenturyLink along this path, particularly since CenturyLink will make Tier 3’s Seattle offices the center of their cloud business, and they’re retaining Jared Wray (Tier 3’s founder) as their cloud CTO.
If you read Gartner research, you’ve probably noticed that we’ve started referring to something called “fast VM restart”. We consider it to be a critical infrastructure resiliency feature for many business application workloads.
Many applications are small. Really, really small. They take a fraction of a CPU core, and less than 1 GB of RAM. And they’ll never get any bigger. (They drive the big wins in server consolidation.) Most applications in mainstream businesses are like that. I often refer to these as “paperwork apps” — somebody fills out a form, that form is routed and processed, and eventually someone runs a report. Businesses have a zillion of these and continue to write more. When an organization says they have hundreds, or thousands, of apps, most of them are paperwork apps. They can be built by not especially bright or skilled programmers, and for resilience, they rely on the underlying infrastructure platform.
A couple things can happen to these kinds of paperwork apps in the future:
- They can be left on-premise to run as-is within an enterprise virtualization environment (that maybe eventually becomes private cloud-ish), relying on its infrastructure resilience.
- They can be migrated into a cloud IaaS environment, relying on it for infrastructure resilience.
- They can be migrated onto a PaaS, either on-premise or from a service provider, relying on it for resilience.
- They can be moved to business process management (BPM) platforms, either via on-premise deployment of a BPM suite, or a BPM PaaS, thereby making resilience the problem of the BPM software.
Note the thing that’s not on that list: Re-architecting the application for application-level resilience. That requires that your developers be skilled enough to do it, and for you to be able to run in a distributed fashion, which, due to the low level of resources consumed, isn’t economical.
Of the various scenarios above, the lift-and-shift onto cloud IaaS is a hugely likely one for many applications. But businesses want to be comfortable that the availability will be comparable to their existing on-premise infrastructure.
So what does infrastructure resilience mean? In short, it means minimal downtime due to either planned maintenance or a failure of the underlying hardware. Live migration is the most common modern technique used to mitigate downtime for planned maintenance that impacts the physical host. Fast VM restart is the most common technique used to mitigate downtime due to hardware failure.
Fast VM restart is built into nearly all modern hypervisors. It’s not magical — a shocking number of VMware customers believe that VM HA means they’ll instantly get a workload onto a happy healthy host from a failed host (i.e., they confuse live migration with VM HA). Fast VM restart is basically a technique to rapidly detect that a physical host has failed, and restart the VMs that were on that host, on some other host. It doesn’t necessarily need to be implemented at the virtualization level — you can just have monitoring that runs at a very short polling interval and that orchestrates a move-and-restart of VMs when it sees a host failure, for instance. (Of course, you need a storage and network architecture that makes this viable, too.)
Clearly, not all applications are happy when they get what is basically an unexpected reboot, but this is the level of infrastructure resiliency that works just fine for non-distributed applications. When customers babble about the reliability of their on-premise VMware-based infrastructure, this is pretty much what they mean. They think it has value. They’re willing to pay more for it. There’s no real reason why it shouldn’t be implemented by every cloud IaaS providers that intends to take general business applications, not just the VMware-based providers.
By the way: Lost in the news about live migration in Google Compute Engine has been an interesting new subtlety. I missed noticing this in my first read-through of the announcement, since it was phrased purely in the context of maintenance, and only a re-read while finishing up this blog post led me to wonder about general-case restart. And I haven’t seen this mentioned elsewhere:
The new GCE update also adds fast VM restart, which Google calls Automatic Restart. To quote the new documentation, “You can set up Google Compute Engine to automatically restart an instance if it is taken offline by a system event, such as a hardware failure or scheduled maintenance event, using the automaticRestart setting.” Answering a query from me, Google said that the restart time is dependent upon the type of failure, but that in most cases, it should be under three minutes.
So a gauntlet has been (subtly) thrown. This is one of the features that enterprises most want in an “enterprise-grade” cloud IaaS offering. Now Google has it. Will AWS, Microsoft, and others follow suit?
Google Compute Engine (GCE) has been a potential cloud-emperor contender in the shadows, and although GCE is still in beta, it’s been widely speculated that Google will likely be the third vendor in the trifecta of big cloud IaaS market-share leaders, along with Amazon Web Services (AWS) and Microsoft Windows Azure.
Few would doubt Google’s technology prowess, if it decides to commit itself to a business, though. A critical question has remained, though: Will Google be able to deliver technology capabilities that can be used by mere mortals in the enterprise, and market, sell, contract for, and deliver service in a way that such businesses can use? (Its ability to serve ephemeral large-scale compute workloads, and perhaps meet the needs of start-ups, is not in doubt.)
One of the most heartburn-inducing aspects of GCE has been its scheduled maintenance, To quote Google: “For scheduled zone maintenance windows, Google takes an entire zone offline for roughly two weeks to perform various, disruptive maintenance tasks.” Basically, Google has said, “Your data center will be going away for up to two weeks. Deal with it. You should be running in multiple zones anyway.”
Even most cloud-native start-ups aren’t capable of easily executing this way. Remember that most applications are architected to have their data locally, in the same zone as the compute. Without using Google’s PaaS capabilities (like Datastore), this means that the customer needs to move and/or replicate storage into another zone, which also increases their costs. Many applications aren’t large enough to warrant the complexity of a multi-zone implementation, either — not only business applications, but also smaller start-ups, mobile back-end implementations, and so forth.
So inherently, a hard-line stance on taking zones offline for maintenance, limited GCE’s market opportunity. Despite positioning this as a hard-line stance previously, Google has clearly changed its mind, introducing “transparent maintenance”. This is accomplished with a combination of live migration technology, and some innovations related to their implementation of physical data center maintenance. It’s an interesting indication of Google listening to prospects and customers and flexing to do something that has not been the Google Way.
Not only will Google’s addition of migration help data center maintenance, but more importantly, it will mitigate downtime related to host maintenance. Although AWS, for instance, tries to minimize host maintenance in order to avoid instance downtime or reboots, host maintenance is necessary — and it’s highly useful to have a technology that allows you to host maintenance without downtime for the instances, because this encourages you not to delay host maintenance (since you want to update the underlying host OS, hypervisor, etc.).
VMware-based providers almost always do live migration for host maintenance, since it’s one of the core compelling features of VMware. But AWS, and many competitors that model themselves after AWS, don’t. I hope that Google’s decision to add live migration into GCE pushes the rest of the market — and specifically AWS, which today generally sets the bar for customer expectations — into doing the same, because it’s a highly useful infrastructure resilience feature, and it’s important to customers.
More broadly, though, AWS hasn’t really had innovation competitors to date. Microsoft Azure is a real competitor, but other than in PaaS, they’ve largely been playing catch-up. Thanks to its extensive portfolio of internal technologies, Google has the potential ability to inject truly new capabilities into the market. Similar to what customers have seen with AWS — when AWS has been successful at introducing capabilities that many customers weren’t really even aware that they wanted — I expect Google is going to launch truly innovative capabilities that will turn into customer demands. It’s not that AWS is going to simply mount a competitive response — it will become a situation where customers ask for these capabilities, pushing AWS to respond. That should be excellent for the market.
It’s worth noting that the value of Google is not just GCE — it is Google Cloud Platform as a whole, including the PaaS elements. This is similarly true with Microsoft Azure. And although AWS seems to broadly bucketed as IaaS, in reality their capabilities overlap into the PaaS space. These vendors understand that the goal is the ability to develop and deliver business capaiblities more quickly — not to provide cheap infrastructure.
Capabilities equate lock-in, by the way, but historically, businesses have embraced lock-in whenever it results in more value delivered.