Infrastructure resilience, fast VM restart, and Google Compute Engine

If you read Gartner research, you’ve probably noticed that we’ve started referring to something called “fast VM restart”. We consider it to be a critical infrastructure resiliency feature for many business application workloads.

Many applications are small. Really, really small. They take a fraction of a CPU core, and less than 1 GB of RAM. And they’ll never get any bigger. (They drive the big wins in server consolidation.) Most applications in mainstream businesses are like that. I often refer to these as “paperwork apps” — somebody fills out a form, that form is routed and processed, and eventually someone runs a report. Businesses have a zillion of these and continue to write more. When an organization says they have hundreds, or thousands, of apps, most of them are paperwork apps. They can be built by not especially bright or skilled programmers, and for resilience, they rely on the underlying infrastructure platform.

A couple things can happen to these kinds of paperwork apps in the future:

  1. They can be left on-premise to run as-is within an enterprise virtualization environment (that maybe eventually becomes private cloud-ish), relying on its infrastructure resilience.
  2. They can be migrated into a cloud IaaS environment, relying on it for infrastructure resilience.
  3. They can be migrated onto a PaaS, either on-premise or from a service provider, relying on it for resilience.
  4. They can be moved to business process management (BPM) platforms, either via on-premise deployment of a BPM suite, or a BPM PaaS, thereby making resilience the problem of the BPM software.

Note the thing that’s not on that list: Re-architecting the application for application-level resilience. That requires that your developers be skilled enough to do it, and for you to be able to run in a distributed fashion, which, due to the low level of resources consumed, isn’t economical.

Of the various scenarios above, the lift-and-shift onto cloud IaaS is a hugely likely one for many applications. But businesses want to be comfortable that the availability will be comparable to their existing on-premise infrastructure.

So what does infrastructure resilience mean? In short, it means minimal downtime due to either planned maintenance or a failure of the underlying hardware. Live migration is the most common modern technique used to mitigate downtime for planned maintenance that impacts the physical host. Fast VM restart is the most common technique used to mitigate downtime due to hardware failure.

Fast VM restart is built into nearly all modern hypervisors. It’s not magical — a shocking number of VMware customers believe that VM HA means they’ll instantly get a workload onto a happy healthy host from a failed host (i.e., they confuse live migration with VM HA). Fast VM restart is basically a technique to rapidly detect that a physical host has failed, and restart the VMs that were on that host, on some other host. It doesn’t necessarily need to be implemented at the virtualization level — you can just have monitoring that runs at a very short polling interval and that orchestrates a move-and-restart of VMs when it sees a host failure, for instance. (Of course, you need a storage and network architecture that makes this viable, too.)

Clearly, not all applications are happy when they get what is basically an unexpected reboot, but this is the level of infrastructure resiliency that works just fine for non-distributed applications. When customers babble about the reliability of their on-premise VMware-based infrastructure, this is pretty much what they mean. They think it has value. They’re willing to pay more for it. There’s no real reason why it shouldn’t be implemented by every cloud IaaS providers that intends to take general business applications, not just the VMware-based providers.

By the way: Lost in the news about live migration in Google Compute Engine has been an interesting new subtlety. I missed noticing this in my first read-through of the announcement, since it was phrased purely in the context of maintenance, and only a re-read while finishing up this blog post led me to wonder about general-case restart. And I haven’t seen this mentioned elsewhere:

The new GCE update also adds fast VM restart, which Google calls Automatic Restart. To quote the new documentation, “You can set up Google Compute Engine to automatically restart an instance if it is taken offline by a system event, such as a hardware failure or scheduled maintenance event, using the automaticRestart setting.” Answering a query from me, Google said that the restart time is dependent upon the type of failure, but that in most cases, it should be under three minutes.

So a gauntlet has been (subtly) thrown. This is one of the features that enterprises most want in an “enterprise-grade” cloud IaaS offering. Now Google has it. Will AWS, Microsoft, and others follow suit?

Google Compute Engine and live migration

Google Compute Engine (GCE) has been a potential cloud-emperor contender in the shadows, and although GCE is still in beta, it’s been widely speculated that Google will likely be the third vendor in the trifecta of big cloud IaaS market-share leaders, along with Amazon Web Services (AWS) and Microsoft Windows Azure.

Few would doubt Google’s technology prowess, if it decides to commit itself to a business, though. A critical question has remained, though: Will Google be able to deliver technology capabilities that can be used by mere mortals in the enterprise, and market, sell, contract for, and deliver service in a way that such businesses can use? (Its ability to serve ephemeral large-scale compute workloads, and perhaps meet the needs of start-ups, is not in doubt.)

One of the most heartburn-inducing aspects of GCE has been its scheduled maintenance, To quote Google: “For scheduled zone maintenance windows, Google takes an entire zone offline for roughly two weeks to perform various, disruptive maintenance tasks.” Basically, Google has said, “Your data center will be going away for up to two weeks. Deal with it. You should be running in multiple zones anyway.”

Even most cloud-native start-ups aren’t capable of easily executing this way. Remember that most applications are architected to have their data locally, in the same zone as the compute. Without using Google’s PaaS capabilities (like Datastore), this means that the customer needs to move and/or replicate storage into another zone, which also increases their costs. Many applications aren’t large enough to warrant the complexity of a multi-zone implementation, either — not only business applications, but also smaller start-ups, mobile back-end implementations, and so forth.

So inherently, a hard-line stance on taking zones offline for maintenance, limited GCE’s market opportunity. Despite positioning this as a hard-line stance previously, Google has clearly changed its mind, introducing “transparent maintenance”. This is accomplished with a combination of live migration technology, and some innovations related to their implementation of physical data center maintenance. It’s an interesting indication of Google listening to prospects and customers and flexing to do something that has not been the Google Way.

Not only will Google’s addition of migration help data center maintenance, but more importantly, it will mitigate downtime related to host maintenance. Although AWS, for instance, tries to minimize host maintenance in order to avoid instance downtime or reboots, host maintenance is necessary — and it’s highly useful to have a technology that allows you to host maintenance without downtime for the instances, because this encourages you not to delay host maintenance (since you want to update the underlying host OS, hypervisor, etc.).

VMware-based providers almost always do live migration for host maintenance, since it’s one of the core compelling features of VMware. But AWS, and many competitors that model themselves after AWS, don’t. I hope that Google’s decision to add live migration into GCE pushes the rest of the market — and specifically AWS, which today generally sets the bar for customer expectations — into doing the same, because it’s a highly useful infrastructure resilience feature, and it’s important to customers.

More broadly, though, AWS hasn’t really had innovation competitors to date. Microsoft Azure is a real competitor, but other than in PaaS, they’ve largely been playing catch-up. Thanks to its extensive portfolio of internal technologies, Google has the potential ability to inject truly new capabilities into the market. Similar to what customers have seen with AWS — when AWS has been successful at introducing capabilities that many customers weren’t really even aware that they wanted — I expect Google is going to launch truly innovative capabilities that will turn into customer demands. It’s not that AWS is going to simply mount a competitive response — it will become a situation where customers ask for these capabilities, pushing AWS to respond. That should be excellent for the market.

It’s worth noting that the value of Google is not just GCE — it is Google Cloud Platform as a whole, including the PaaS elements. This is similarly true with Microsoft Azure. And although AWS seems to broadly bucketed as IaaS, in reality their capabilities overlap into the PaaS space. These vendors understand that the goal is the ability to develop and deliver business capaiblities more quickly — not to provide cheap infrastructure.

Capabilities equate lock-in, by the way, but historically, businesses have embraced lock-in whenever it results in more value delivered.

IBM SoftLayer, versus Amazon or versus Rackspace?

Near the beginning of July, IBM closed its acquisition of SoftLayer (which I discussed in a previous blog post). A little over three months have passed since then, and IBM has announced the addition of more than 1,500 customers, the elimination of SmartCloud Enterprise (SCE, IBM’s cloud IaaS offering), and went on the offensive against Amazon in an ad campaign (analyzed in my colleague Doug Toombs’s blog post). So what does this all mean for IBM’s prospects in cloud infrastructure?

IBM is unquestionably a strong brand with deep customer relationships — it exerts a magnetism for its customers that competitors like HP and Dell don’t come anywhere near to matching. Even with all of the weaknesses of the SCE offering, here at Gartner, we still saw customers choose the service simply because it was from IBM — even when the customers would openly acknowledge that they found the platform deficient and it didn’t really meet their needs.

In the months since the SoftLayer acquisition has closed, we’ve seen this “we use IBM for everything by preference” trend continue. It certainly helps immensely that SoftLayer is a more compelling solution than SCE, but customers continue to acknowledge that they don’t necessarily feel they’re buying the best solution or the best technology, but they are getting something that is good enough from a vendor that they trust. Moreover, they are getting it now; IBM has displayed astonishing agility and a level of aggression that I’ve never seen before. It’s impressive how quickly IBM has jump-started the pipeline this early into the acquisition, and IBM’s strengths in sales and marketing are giving SoftLayer inroads into a mid-market and enterprise customer base that it wasn’t able to target previously.

SoftLayer has always competed to some degree against AWS (philosophically, both companies have an intense focus on automation, and SoftLayer’s bare-metal architecture is optimal for certain types of use cases), and IBM SoftLayer will as well. In the IBM SoftLayer deals we’ve seen in the last couple of months, though, their competition isn’t really Amazon Web Services (AWS). AWS is often under consideration, but the real competitor is much more likely to be Rackspace — dedicated servers (possibly with a hybrid cloud model) and managed services.

IBM’s strategy is actually a distinctively different one from the other providers in the cloud infrastructure market. SoftLayer’s business is overwhelmingly dedicated hosting — mostly small-business customers with one or two bare-metal servers (a cost-sensitive, high-churn business), though they had some customers with large numbers of bare-metal servers (gaming, consumer-facing websites, and so forth). It also offers cloud IaaS, called CloudLayer, with by-and-hour VMs and small bare-metal servers, but this is a relatively small business (AWS has individual customers that are bigger than the entirety of CloudLayer). SoftLayer’s intellectual property is focused on being really, really good at quickly provisioning hardware in a fully automated way.

IBM has decided to do something highly unusual — to capitalize on SoftLayer’s bare-metal strengths, and to strongly downplay virtualization and the role of the cloud management platform (CMP). If you want a CMP — OpenStack, CloudStack, vCloud Director, etc. — on SoftLayer, there’s an easy way to install the software on bare metal. But if you want it updated, maintained, etc., you’ll either have to do it yourself, or you need to contract with IBM for managed services. If you do that, you’re not buying cloud IaaS; you’re renting hardware and CMP software, and building your own private cloud.

While IBM intends to expand the configuration options available in CloudLayer (and thus the number of hardware options available by the hour rather than by the month), their focus is upon the lower-level infrastructure constructs. This also means that they intend to remain neutral in the CMP wars. IBM’s outsourcing practice has historically been pretty happy to support whatever software you use, and the same largerly applies here — they’re offering managed services for the common CMPs, in whatever way you choose to configure them.

In other words, while IBM intends to continue its effort to incorporate OpenStack as a provisioning manager in its “Smarter Infrastructure” products (the division formerly known as Tivoli), they are not launching an OpenStack-based cloud IaaS, replacing the existing CloudLayer cloud IaaS platform, or the like.

IBM also intends to use SoftLayer as the underlying hardware platform for the application infrastructure components that will be in its Cloud Foundry-based framework for PaaS. It will depend on these components to compete against the higher-level constructs in the market (like Amazon’s RDS database-as-a-service).

IBM SoftLayer has a strong value proposition for certain use cases, but today their distinctive value proposition is a different one than AWS’s, but a very similar one to Rackspace’s (although I think Rackspace is going to embrace DevOps-centric managed services, while IBM seems more traditional in its approach). But IBM SoftLayer is still an infrastructure-centric story. I don’t know that they’re going to compete with the vision and speed of execution currently being displayed by AWS, Microsoft, and Google, but ultimately, those providers may not be IBM SoftLayer’s primary competitors.

Verizon Cloud is technically innovative, but is it enough?

Verizon Terremark has announced the launch of its new Verizon Cloud service built using its own technology stack.

Verizon already owns a cloud IaaS offering — in fact, it owns several. Terremark was an early AWS competitor with the Terremark Enterprise Cloud, a VMware-based offering that got strong enterprise traction during the early years of this market (and remains the second-most-common cloud provider amongst Gartner’s clients, with many companies using both AWS and Terremark), as well as a vCloud Express offering. Verizon entered the game later with Verizon Compute as a Service (now called Enterprise Cloud Managed Edition), also VMware-based. Since Verizon’s acquisition of Terremark, the company has continued to operate all the existing platforms, and intends to continue to do so for some time to come.

However, Verizon has had the ambition to be a bigger player in cloud; like many other carriers, it believes that network services are a commodity and a carrier needs to have stickier, value-added, higher-up-the-stack services in order to succeed in the future. However, Verizon also understood that it would have to build technology, not depend on other people’s technology, if it wanted to be a truly competitive global-class cloud player versus Amazon (and Microsoft, Google, etc.).

With that in mind, in 2011, Verizon went and made a manquisition — acquiring CloudSwitch not so much for its product (essentially hypervisor-within-a-hypervisor that allows workloads to be ported across cloud infrastructures using different technologies), as for its team. It gave them a directive to go build a cloud infrastructure platform with a global-class architecture that could run enterprise-class workloads, at global-class scale and at fully competitive price points.

Back in 2011, I conceived what I called the on-demand infrastructure fabric (see my blog post No World of Two Clouds, or, for Gartner clients, the research note, Market Trends: Public and Private Cloud Infrastructure Converge into On-Demand Infrastructure Fabrics) — essentially, a global-class infrastructure fabric with self-service selectable levels of availability, performance, and isolation. Verizon is the first company to have really built what I envisioned (though their project predates my note, and my vision was developed independently of any knowledge of what they were doing).

The Verizon Cloud architecture is actually very interesting, and, as far as I know, unique amongst cloud IaaS providers. It is almost purely a software-defined data center. Components are designed at a very low level — a custom hypervisor, SDN augmented with the use of NPUs, virtualized distributed storage. Verizon has generally tried to avoid using components for which they do not have source code. There are very few hardware components — there’s x86 servers, Arista switches, and commodity Flash storage (the platform is all-SSD). The network is flat, and high bandwidth is an expectation (Verizon is a carrier, after all). Oh, and there’s object-based storage, too (which I won’t discuss here).

The Verizon Cloud has a geographically distributed control plane designed for continuous availability, and it, along with the components, are supposed to be updatable without downtime (i.e., maintenance should not impact anything). It’s intended to provide fine-grained performance controls for the compute, network, and storage resource elements. It is also built to allow the user to select fault domains, allowing strong control of resource placement (such as “these two VMs cannot sit on the same compute hardware”); within a fault domain, workloads can be rebalanced in case of hardware failure, thus offering the kind of high availability that’s often touted in VMware-based clouds (including Terremark’s previous offerings). It is also intended to allow dynamic isolation of compute, storage, and networking components, allowing the creation of private clouds within a shared pool of hardware capacity.

The Verizon Cloud is intended to be as neutral as possible — the theory is that all VM hypervisors can run natively on Verizon’s hypervisor, many APIs can be supported (including its own API, the existing Terremark API, and the AWS, CloudStack, and OpenStack APIs), and there’ll be support for the various VM image formats. Initially, the supported hypervisor is a modified Xen. In other words, Verizon wants to take your workloads, wherever you’re running them now, and in whatever form you can export them.

It’s an enormously ambitious undertaking. It is, assuming it all works as promised, a technical triumph — it’s the kind of engineering you expect out of an organization like AWS or Google, or a software company like Microsoft or VMware, not a staid, slow-moving carrier (the mere fact that Verizon managed to launch this is a minor miracle unto itself). It is actually, in a way, what OpenStack might have aspired to be; the delta between this and the OpenStack architecture is, to me, full of sad might-have-beens of what OpenStack had the potential to be, but is not and is unlikely to become. (Then again, service providers have the advantage of engineering to a precisely-controlled environment. OpenStack, and for that matter, VMware, need to run on whatever junk the customer decides to use, instantly making the problem more complex.)

Unfortunately, the question at this stage is: Will anybody care?

Yes, I think this is an important development in the market, and the fact that Verizon is already a credible cloud player in the enterprise, with an entrenched base in the Terremark Enterprise Cloud, will help it. But in a world where developers control most IaaS purchasing, the bare-bones nature of the new Verizon offering means that it falls short of fulfilling the developer desire for greater productivity. In order to find a broader audience, Verizon will need to commit to developing all the richness of value-added capabilities that the market leaders will need — which likely means going after the PaaS market with the same degree of ambition, innovation, and investment, but certainly means committing to rapidly introducing complementing capabilities and bringing a rich ecosystem in the form of a software marketplace and other partnerships. Verizon needs to take advantage of its shiny new IaaS building blocks to rapidly introduce additional capabilities — much like Microsoft is now rapidly introducing new capabilities into Azure.

With that, assuming that this platform performs as designed, and Verizon can continue to treat Terremark’s cloud folks like they belong to a fast-moving start-up and not an ossified pipe provider, Verizon may have a shot at being one of the leaders in this market. Without that, the Verizon Cloud is likely to be relegated to a niche, just like every other provider whose capabilities stop at the level of offering infrastructure resources.

No world of two clouds

Massimo Re Ferre’ recently posted some thoughts as a follow-up to his talk at VMworld, about vCHS vs. AWS. That led to a Twitter exchange that made me think that I should highlight a viewpoint of mine:

I do not believe in a “world of two clouds”, where there are cloud IaaS offerings that are targeted at enterprise workloads, and there are cloud IaaS offerings that are targeted at cloud-native workloads — broadly, different clouds for applications designed with the assumption of infrastructure resilience, versus applications designed with the assumption that resilience must reside at the application layer.

Instead, I believe that the market leaders will offer a range of infrastructure resources. Some of those infrastructure resources will be more resilient, and will be more expensive. And customers will pay for the level of performance they receive. There’s no need to build two clouds; in fact, customers actively do not want two different clouds, since nobody really wants to shift between different clouds as you go through an application’s lifecycle, or for different tiers of an app, some of which might need greater infrastructure resilience and guaranteed performance.

I do not believe that application design patterns change to be fully cloud-native over time. First, enterprises have hundreds if not thousands of existing legacy applications that they will need to host. Second, enterprises continue to write non-cloud-native apps, because the typical app is small — it’s some kind of business process app (I call these “paperwork” apps, usually online forms with some workflow and reporting), and it runs on a tiny VM, has few users. It’s neither cost-effective to spend the developer time to make these apps resilient, nor cost-effective to distribute them. Putting them on decently resilient infrastructure is less expensive. Some of these apps should more logically be written on a business process management suite or PaaS (BPMS or bpmPaaS), or on a more general PaaS; that underlying BMPS/PaaS should hopefully functionally provide resilience, but that won’t deal with the existing legacy apps, so there’ll continue to be a need for resilient infrastructure.

When people talk about infrastructure resilience, they’re generally referring to compute resilience in particular — essentially, trying to protect the application from the impact of potential server hardware failure. VMware pioneered two technologies in this space — they call them “HA” (fast detection of physical host failure and automatic restart of the VMs that were running on that host, on some other host) and “vMotion” (live migration of VMs from one physical host to another). However, all the other major hypervisors have now incorporated these features. There’s absolutely no reason why a cloud IaaS provider like AWS, which doesn’t currently support these capabilities, can’t add them, and charge a premium for these VMs.

When people talk about performance consistency, they’re generally referring to storage and network performance. (Most cloud IaaS providers do not oversubscribe either CPU or RAM resources.) Predictable storage performance is a very difficult engineering problem. Companies like SolidFire are offering all-SSD storage to help accomplish this (since it reduces the variability of seek times), and we’re seeing gradual uptake of this technology into cloud IaaS providers. AWS has done “provisioned iops” (PIOPS), allowing customers to buy into a more predictable range of storage performance. There’s no reason why providers wouldn’t offer this kind of predictability for both storage and network — especially when they can charge extra for it.

Now, there are tons of service providers out there building to that world of two clouds — often rooted in the belief that IT operations will want one thing, and developers another, and they should build something totally different for both. This is almost certainly a losing strategy. Winning providers will satisfy both needs within a single cloud, offering architectural flexibility that allows developers to decide whether or not they want to build for application resiliency or infrastructure resiliency.

For more on this: I’ve covered this in detail in my research note, Market Trends: Public and Private Cloud Infrastructure Converge into On-Demand Infrastructure Fabrics (Gartner clients only).

Where are the challengers to AWS?

This is part of 2 of my response to Bernard Golden’s recent CIO.com blog post in response to my announcement of Gartner’s 2013 Magic Quadrant for Cloud IaaS. (Part 1 was posted yesterday.)

Bernard: “What skill or insight has allowed AWS to create an offering so superior to others in the market?”

AWS takes a comprehensive view of “what does the customer need”, looks at what customers (whether current customers or future target customers) are struggling with, and tries to address those things. AWS not only takes customer feedback seriously, but it also iterates at shocking speed. And it has been willing to invest massively in engineering. AWS’s engineering organization and the structure of the services themselves allows multiple, parallel teams to work on different aspects of AWS with minimal dependencies on the other teams. AWS had a head start, and with every passing year their engineering lead has grown larger. (Even though they have a significant burden of technical debt from having been first, they’ve also solved problems that competitors haven’t had to yet, due to their sheer scale.)

Many competitors haven’t had the willingness to invest the resources to compete, especially if they think of this business as one that’s primarily about getting a VM fast and that’s all. They’ve failed to understand that this is a software business, where feature velocity matters. You can sometimes manage to put together brilliant, hyper-productive small teams, but this is usually going to get you something that’s wonderful in the scope of what they’ve been able to build, but simply missing the additional capabilities that better-resourced competitors can manage (especially if a competitor can muster both resources and hyper-productivity). There are some awesome smaller companies in this space, though.

Bernard: “Plainly stated, why hasn’t a credible competitor emerged to challenge AWS?”

I think there’s a critical shift happening in the market right now. Three very dangerous competitors are just now entering the market — Microsoft, Google, and VMware. I think the real war for market share is just beginning.

For instance, consider the following, off the cuff, thoughts on those vendors. These are by no means anything more than quick thoughts and not a complete or balanced analysis. I have a forthcoming research note called “Rise of the Cloud IaaS Mega-Vendors” that focuses on this shift in the competitive landscape, and which will profile these four vendors in particular, so stay tuned for more. So, that said:

Microsoft has brand, deep customer relationships, deep technology entrenchment, and a useful story about how all of those pieces are going to fit together, along with a huge army of engineers, and a ton of money and the willingness to spend wherever it gains them a competitive advantage; its weakness is Microsoft’s broader issues as well as the Microsoft-centricity of its story (which is also its strength, of course). Microsoft is likely to expand the market, attracting new customers and use cases to IaaS — including blended PaaS models.

Google has brand, an outstanding engineering team, and unrivaled expertise at operating at scale; its weakness is Google’s usual challenges with traditional businesses (whatever you can say about AWS’s historical struggle with the enterprise, you can say about Google many times over, and it will probably take them at least as long as AWS did to work through that). Google’s share gain will mostly come at the expense of AWS’s base of HPC customers and young start-ups, but it will worm its way into the enterprise via interactive agencies that use its cloud platform; it should have a strong blended PaaS model.

VMware has brand, a strong relationship with IT operations folks, technology it can build on, and a hybrid cloud story to tell; whether or not its enterprise-class technology can scale to global-class clouds remains to be seen, though, along with whether or not it can get its traditional customer base to drive sufficient volume of cloud IaaS. It might expand the market, but it’s likely that much of its share gain will come at the expense of VMware-based “enterprise-class” service providers.

Obviously, it will take these providers some time to build share, and there are other market players who will be involved, including the other providers that are in the market today (and for all of you wondering “what about OpenStack”, I would classify that under the fates of the individual providers who use it). However, if I were to place my bets, it would be on those four at the top of market share, five years from now. They know that this is a software business. They know that innovative capabilities are vitally necessary. And they know that this has turned into a market fixated on developer productivity and business benefits. At least for now, that view is dominating the actual spending in this market.

You can certainly argue that another market outcome should have happened, that users should have chosen differently, or even that users are making poor decisions now that they’ll regret later. That’s an interesting intellectual debate, but at this point, Sisyphus’s rock is rolling rapidly downhill, so anyone who wants to push it back up is going to have an awfully difficult time not getting crushed.

Cloud IaaS market share and the developer-centric world

Bernard Golden recently wrote a CIO.com blog post in response to my announcement of Gartner’s 2013 Magic Quadrant for Cloud IaaS. He raised a number of good questions that I thought it would be useful to address. This is part 1 of my response. (See part 2 for more.)

(Broadly, as a matter of Gartner policy, analysts do not debate Magic Quadrant results in public, and so I will note here that I’m talking about the market, and not the MQ itself.)

Bernard: “Why is there such a distance between AWS’s offering and everyone else’s?”

In the Magic Quadrant, we rate not only the offering itself in its current state, but also a whole host of other criteria — the roadmap, the vendor’s track record, marketing, sales, etc. (You can go check out the MQ document itself for those details.) You should read the AWS dot positioning as not just indicating a good offering, but also that AWS has generally built itself into a market juggernaut. (Of course, AWS is still far from perfect, and depending on your needs, other providers might be a better fit.)

But Bernard’s question can be rephrased as, “Why does AWS have so much greater market share than everyone else?”

Two years ago, I wrote two blog posts that are particularly relevant here:

These posts were followed up wih two research notes (links are Gartner clients only):

I have been beating the “please don’t have contempt for developers” drum for a while now. (I phrase it as “contempt” because it was often very clear that developers were seen as lesser, not real buyers doing real things — merely ignoring developers would have been one thing, but contempt is another.) But it’s taken until this past year before most of the “enterprise class” vendors acknowledged the legitimacy of the power that developers now hold.

Many service providers held tight to the view espoused by their traditional IT operations clientele: AWS was too dangerous, it didn’t have sufficient infrastructure availability, it didn’t perform sufficiently well or with sufficient consistency, it didn’t have enough security, it didn’t have enough manageability, it didn’t have enough governance, it wasn’t based on VMware — and it didn’t look very much like an enterprise’s data center architecture. The viewpoint was that IT operations would continue to control purchases, implementations would be relatively small-scale and would be built on traditional enterprise technologies, and that AWS would never get to the point that they’d satisfy traditional IT operations folks.

What they didn’t count on was the fact that developers, and the business management that they ultimately serve, were going to forge on ahead without them. Or that AWS would steadily improve its service and the way it did business, in order to meet the needs of the traditional enterprise. (My colleagues in GTP — the Gartner division that was Burton Group — do a yearly evaluation of AWS’s suitability for the enterprise, and each year, AWS gets steadily, materially better. Clients: see the latest.)

Today, AWS’s sheer market share speaks for itself. And it is definitely not just single developers with a VM or two, start-ups, or non-mission-critical stuff. Through the incredible amount of inquiry we take at Gartner, we know how cloud IaaS buyers think, source, succeed, and sometimes suffer. And every day at Gartner, we talk to multiple AWS customers (or prospects considering their options, though many have already bought something on the click-through agreement). Most are traditional enterprises of the G2000 variety (including some of the largest companies in the world), but over the last year, AWS has finally cracked the mid-market by working with systems integrator partners. The projected spend levels are clearly increasing dramatically, the use cases are extremely broad, the workloads increasingly have sensitive data and regulatory compliance concerns, and customers are increasingly thinking of AWS as a strategic vendor.

(Now, as my colleagues who cover the traditional data center like to point out, the spend levels are still trivial compared to what these customers are spending on the rest of their data center IT, but I think what’s critical here is the shift in thinking about where they’ll put their money in the future, and their desire to pick a strategic vendor despite how relatively early-stage the market is.)

But put another way — it is not just that AWS advanced its offering, but it convinced the market that this is what they wanted to buy (or at least that it was a better option than the other offerings), despite the sometimes strange offering constructs. They essentially created demand in a new type of buyer — and they effectively defined the category. And because they’re almost always first to market with a feature — or the first to make the market broadly aware of that capability — they force nearly all of their competitors into playing catch-up and me-too.

That doesn’t mean that the IT operations buyer isn’t important, or that there aren’t an array of needs that AWS does not address well. But the vast majority of the dollars spent on cloud IaaS are much more heavily influenced by developer desires than by IT operations concerns — and that means that market share currently favors the providers who appeal to development organizations. That’s an ongoing secular trend — business leaders are currently heavily growth-focused, and therefore demanding lots of applications delivered as quickly as possible, and are willing to spend money and take greater risks in order to obtain greater agility.

This also doesn’t mean that the non-developer-centric service providers aren’t important. Most of them have woken up to the new sourcing pattern, and are trying to respond. But many of them are also older, established organizations, and they can only move so quickly. They also have the comfort of their existing revenue streams, which allow them the luxury of not needing to move so quickly. Many have been able to treat cloud IaaS as an extension of their managed services business. But they’re now facing the threat of systems integrators like Cognizant and Capgemini entering this space, combining application development and application management with managed services on a strategic cloud IaaS provider’s platform — at the moment, normally AWS. Nothing is safe from the broader market shift towards cloud computing.

As always, every individual customer’s situation is different from another’s, and the right thing to do (or the safe, mainstream thing to do) evolves through the years. Gartner is appropriately cautionary when it discusses such things with clients. This is a good time to mention that Magic Quadrant placement is NEVER a good reason to include or exclude a vendor from a short list. You need to choose the vendor that’s right for your use case, and that might be a Niche Player, or even a vendor that’s not on the MQ at all — and even though AWS has the highest overall placement, they might be completely unsuited to your use case.

Five more reasons to work at Gartner with me

A couple of years ago, I wrote a blog post called “Five reasons you should work at Gartner with me“. Well, we’re recruiting again for an analyst to replace Aneel Lakhani, who is sadly leaving us to go to a start-up. While this analyst role isn’t part of my team, I expect that this is someone that I’ll work closely with, so I have a vested interest in seeing a great person get the job.

Check out the formal job posting. This analyst will cover cloud management products and services, including cloud management platforms (like OpenStack).

All of five reasons that I previously cited for working at Gartner remain true:

  1. It is an unbeatably interesting job for people who thrive on input.
  2. You get to help people in bite-sized chunks.
  3. You get to work with great colleagues.
  4. Your work is self-directed.
  5. We don’t do any pay-to-play.

(See my previous post for the details.)

However, I want to make a particular appeal to women. I know that becoming an industry analyst is an unusual career path that many people have never thought about, and I expect that a lot of women who might find that the job suits them have no idea what working at Gartner is like. While we have a lot of women in the analyst ranks, the dearth of women in technology in general means that we see fewer female candidates for analyst roles.

So, here are five more good reasons why you, a woman, might want a job as a Gartner analyst.

1. We have a lot of women in very senior, very visible analyst roles, along with a lot of women in management. We are far more gender-balanced than you normally see in a technology company. That means that you are just a person, rather than being treated like you’re somehow a representative of women in general and adrift in a sea of men. Your colleagues are never going to dismiss your opinions as somehow lesser because you represent a “woman’s point of view”. Nor are people going to expect a woman to be note-taking or performing admin tasks. And because there are plenty of women, company social activities aren’t male-centric. There are women at all levels of the analyst organization, including at the top levels. That also means there’s an abundance of female mentors, if that matters to you.

2. The traits that might make you termed “too aggressive” are valued in analysts. Traits that are usually considered positive in men — assertive, authoritative, highly confident, direct, with strong opinions — can be perceived as too aggressive in women, which potentially creates problems for those types of women in the workplace. But this is precisely what we’re looking for in analysts (coupled with empathy, being a good communicator, and so on). Clients talk to analysts because they expect us to hold opinions and defend them well.

3. You are shielded from most misogyny in the tech world. You may get the rare social media interaction where someone will throw out a random misogynistic comment, but our analysts aren’t normally subject to bad behavior. You will still get the occasional client who believes you must not be technical because you’re a woman, or doesn’t want a woman telling him what to do, but really, that’s their problem, not yours. Our own internal culture is highly professional; there are lots of strong personalities, but people are normally mature and even-keeled. Our conferences are extremely professionally run, and that means we also hold attendees and sponsors to standards that don’t allow them to engage in women-marginalizing shenanigans.

4. You will use both technical and non-technical skills, and have a real impact. While technical knowledge is critical, and experience beings hands-on technical is extremely useful, it’s simply one aspect of the skillset; communication and other “soft” skills, and an understanding of business strategy and sales and marketing, are also important. Also, the things you do have real impact for our clients, and potentially can shape the industry; if you like your work to have meaning, you’ll certainly find that here.

5. This is a flexible-hours, work-from-anywhere job. This has the potential to be a family-friendly lifestyle. However, I would caution that “work from anywhere” can include a lot of travel, “flexible hours” means that you can end up working all the time (especially because we have clients around the globe and your flexibility needs to include early-morning and late-evening availability), and covering a hot topic is often a very intense job. You have to be good at setting boundaries for how much you work.

(By the way, for this role, the two analysts who cover IT operations management tools most closely, and whose team you would work on, are both women — Donna Scott and Ronni Colville — and both VP Distinguished Analysts, at the very top of our analyst ranks.)

Please feel free to get in contact privately if you’re interested (email preferable, LinkedIn okay as well), regardless of your gender!

The 2013 Cloud IaaS Magic Quadrant

Gartner’s Magic Quadrant for Cloud Infrastructure as a Service, 2013, has just been released (see the client-only interactive version, or the free reprint). Gartner clients can also consult the related charts, which summarize the offerings, features, and data center locations.

We’re now updating this Magic Quadrant on a nine-month basis, and quite a bit has changed since the 2012 update (see the client-only 2012, or the free 2012 reprint).

In particular, market momentum has strongly favored Amazon Web Services. Many organizations have now had projects on AWS for several years, even if they hadn’t considered themselves to have “done anything serious” on AWS. Thus, as those organizations get serious about cloud computing, AWS is their incumbent provider — there are relatively few truly greenfield opportunities in cloud IaaS now. Many Gartner clients now actually have multiple incumbent providers (the most common combination is AWS and Terremark), but nearly all such customers tell us that the balance of new projects are going to AWS, not the other providers.

Little by little, AWS has systematically addressed the barriers to “mainstream”, enterprise adoption. While it’s still far from everything that it could be, and it has some specific and significant weaknesses, that steady improvement over the last couple of years has brought it to the “good enough” point. While we saw much stronger momentum for AWS than other providers in 2012, 2013 has really been a tipping point. We still hear plenty of interest in competitors, but AWS is overwhelmingly the dominant vendor.

At the same time, many vendors have developed relatively solid core offerings. That means that the number of differentiators in the market has decreased, as many features become common “table stakes” features that everyone has. It means that most offerings from major vendors are now fairly decent, but only a few are really stand out for their capabilities.

That leads to an unusual Magic Quadrant, in which the relative strength of AWS in both Vision and Execution essentially forces the whole quadrant graphic to rescale. (To build an MQ, analysts score providers relative to each other, on all of the formal evaluation criteria, and the MQ tool automatically plots the graphic; there is no manual adjustment of placements.) That leaves you with centralized compression of all of the other vendors, with AWS hanging out in the upper right-hand corner.

Note that a Magic Quadrant is an evaluation of a vendor in the market; the actually offering itself is only a portion of the overall score. I’ll be publishing a Critical Capabilities research note in the near future that evaluates one specific public cloud IaaS offering from each of these vendors, against its suitability for a set of specific use cases. My colleagues Kyle Hilgendorf and Chris Gaun have also been publishing extremely detailed technical evaluations of individual offerings — AWS, Rackspace, and Azure, so far.

A Magic Quadrant is a tremendous amount of work — for the vendors as well as for the analyst team (and our extended community of peers within Gartner, who review and comment on our findings). Thanks to everyone involved. I know this year’s placements came as disappointments to many vendors, despite the tremendous hard work that they put into their offerings and business in this past year, but I think the new MQ iteration reflects the cold reality of a market that is highly competitive and is becoming even more so.

Instart Logic launches a new kind of acceleration service

There have been three core techniques for accelerating content and application delivery over the Internet — caching (“classic” CDN), network optimization (think protocol tricks, like F5 Web Application Accelerator on the hardware side, or Akamai DSA on the service side), and front-end optimization (FEO, think content re-write, like Aptimize/Riverbed or Strangeloop/Radware on the software side, or Blaze.io/Akamai or Acceloweb/Limelight on the service side).

Now, with the launch of Instart Logic, there’s a fourth technique, that I don’t yet have a name for. In spirit, it’s probably most similar to a SoftWOC, but in this case, the client endpoint is the browser, and the symmetric remote endpoint is the CDN server. The techniques are also different from typical SoftWOC techniques, as far as I know.

From the perspective of an Instart Logic customer, they’re getting a dynamic acceleration service that, from a deployment perspective, is much like a CDN. For most customers, it would entirely replace using a traditional CDN (rather than being additive) — i.e., they would buy this instead of buying Akamai DSA or a similar service. Note that this is a performance play, not a price play — Instart Logic expects that they’ll be in the ballpark of typical dynamic acceleration pricing, and that performance carries a market premium.

The techniques used in the service are intended to dramatically improve load times, especially on congested networks; this is particularly useful in mobile, but it is not mobile-specific. As with FEO, the goal is to allow the end-user to quickly see and interact with the content while the remainder of the page is still downloading.

On the client side, there’s what they call a “NanoVisor” — an HTML5-based thin virtualization layer that runs in the browser. If Instart Logic is full-proxying the customer’s site, the NanoVisor code can simply be injected; otherwise the customer can insert the code into their site. It requires no other changes to the customer’s site. The NanoVisor provides intelligence about the end-user and serves as the client endpoint for the optimization.

On the server side, the “AppSequencer” analyzes page content, and it fragments and orders objects that are then streamed to the NanoVisor. It does large-scale analysis of usage patterns, and it predictively sends things based on the responses that it’s seen before. There’s compression and network optimization techniques, as well as implicit caching.

Like other recent innovators in the CDN space, Instart Logic is predominantly a software company. Whlie they do have servers of their own, they are also using a variety of cloud IaaS providers for capacity. They’re also using Dyn for DNS.

Instart Logic has raised a significant amount of money, almost purely from top-tier VCs — $26 million to date. I think their technology is very promising, which probably means they’ll get a bit of time to prove themselves out and then they’ll get bought by one of the CDNs looking to get an edge on the competition, or maybe even an ADC or WOC vendor.

Instart Logic’s demos are impressive, and they’ve got paying customers at this point, although obviously they’re newly-launched. While it always takes time to build trust in this industry, at this point they’re worth checking out, and I’ve been referring Gartner clients to them ever since I was briefed by them while they were still in stealth mode, a few months back. They’re potentially an excellent fit for customers who are looking for something beyond what DSA-style network optimization offerings can do, but either do not want to do FEO, have reached the limits of what FEO can offer them, or simply want to explore alternatives.

Follow

Get every new post delivered to your Inbox.

Join 70 other followers

%d bloggers like this: