We began the call-for-vendors process for Gartner’s 2014 Magic Quadrant for Cloud Infrastructure as a Service, as well as the regional Magic Quadrants for Cloud-Enabled Managed Hosting, in December. (See Doug Toombs’s call for vendors for the latter.)
The pre-qualification survey, which is intended to gather quantitative metrics and information about each provider’s service, is going out imminently. We sent out contact confirmations on December 26th to all service providers who are currently on our list to receive the survey. If you haven’t received a contact confirmation and you want to receive the survey, please contact Michele Severance (Michele dot Severance at Gartner dot com), who is providing administrative support for this Magic Quadrant. You must be authorized to speak for your company. Please note we cannot work with PR firms for the Magic Quadrant; if you are a PR agency and you think that your client should be participating, you should get in touch with your client and have your client contact Michele.
This year, we are doing an integrated survey process for multiple Magic Quadrants. The cloud IaaS MQ is, in many ways, foundational. Cloud-enabled managed hosting is the delivery of managed services on top of cloud IaaS and, more broadly, cloud-enabled system infrastructure. Consequently, this year’s survey asks about your platforms, your managed services levels, and how those things combine into service offerings. Because the survey is longer, we’re starting the survey process earlier than usual.
The survey is an important part of our data collection efforts on the markets, not just for the Magic Quadrants. We use the survey data to recommend providers throughout the year, particularly since we try to find providers that can exactly fit a client’s needs — including small niche providers. Far from everything fits into the one-size-fits-all mold of the largest providers.
The Cloud IaaS MQ continues to be updated on a 9-month cycle, reflecting the continued fast pace of the market. It will have similar scope to last year, with a very strong emphasis on self-service capabilities.
Please note that receiving a survey does not in any way indicate that we believe that your company is likely to qualify; we simply allow surveys to go to all interested parties (assuming that theyÃ¢re not obviously wrong fits, like software companies without an IaaS offering).
The status for this Magic Quadrant will be periodically updated on its status page.
I’ve been spending the last week revising the combined service-provider survey for our Magic Quadrant for Cloud IaaS, and the new regional Magic Quadrants for Cloud-Enabled Managed Hosting.
With every year of revision, the way I ask questions becomes lengthier and more specific, along with the boldfaced “THESE THINGS DO NOT COUNT” caveats. These caveats almost inevitably come from things vendors have tried to claim count as X, with varying degrees of creativity and determination.
I consider my behavior part of a category I’ll call “shooting squirrels from the roof”. It comes from a story that a friend once told me about a rental agreement on a house. This rental agreement had all the usual stipulations about what tenants aren’t allowed to do, but then it had a list of increasingly specific and weird directives about what the tenant was not allowed to do, culminating in, “Tenant shall not shoot squirrels from the roof.” You just know that each of these clauses came from some previous bad experience that the landlord had with some tenant, which caused them to add these “thou shalt not” behaviors in great specificity to each subsequent lease.
So, I use the phrase “shooting squirrels from the roof” to denote situations in which someone, having been burned by previous bad experiences, tries to be very specific (often in a contract) to avoid being burned in that way again.
When I look at customer contracts for managed hosting and indeed, for services in general, I sometimes see they’ve got “shooting squirrels from the roof” contract clauses, specifying a list of often-bizarre, horrible things that the provider is not allowed to do. Those customers aren’t crazy (well, at least not entirely); they’ve just been burned before. No doubt if you’re in the services business (whether IT or not), you’ve probably had this experience, too.
Google Compute Engine (GCE) — Google’s cloud IaaS offering — is now in general availability, an announcement accompanied by a 10% price drop, new persistent disk (which should now pretty much always be used instead of scratch disk), and expanded OS support (though no Microsoft Windows yet). The announcement also highlights two things I wrote about GCE recently, in posts about its infrastructure resilience features — live migration and fast VM restart.
Amazon Web Services (AWS) remains the king of this space and is unlikely to be dethroned anytime soon, although Microsoft Windows Azure is clearly an up-and-coming competitor due to Microsoft’s deep established relationships with business customers. GCE is more likely to target the cloud-natives that are going to AWS right now — companies doing things that the cloud is uniquely well-suited to serve. But I think the barriers to Google moving into mainstream businesses are more of a matter of go-to-market execution, along with trust, track record, and an enterprise-friendly way of doing business — Google’s competitive issues are unlikely to be technology.
In fact, I think that Google is likely to push the market forward in terms of innovation in a way that Azure will not; AWS and Google will hopefully goad each other into one-upsmanship, creating a virtuous cycle of introducing things that customers discover they love, thus creating user demand that pushes the market forward. Google has a tremendous wealth of technological capabilities in-house that it likely can externalize over time. Most organizations can’t do things the way that Google does them, but Google can certainly start making the attempt to make it easier for other organizations to adopt the Google approach to the world, by exposing their tools in an easily-consumable way.
GCE still lags AWS tremendously in terms of breadth and depth of feature set, of course, but it also has aspects that are immediately more attractive for some workloads. However, it’s now at the point where it’s a viable alternative to AWS for organizations who are looking to do cloud-native applications, whether they’re start-ups or long-established companies. I think the GA of GCE is a demarcation of market eras — we’re now moving into a second phase of this market, and things only get more interesting from here onwards.
On a totally personal note (a rarity for my blog, for better or for worse):
I am the soloist for the Glazunov violin concerto with the Montgomery Philharmonic, on Sunday, December 15th. (7 pm, Gaithersburg Presbyterian Church, in suburban Washington DC. Free concert, no tickets required, kids welcome.)
It’s an enormously rare opportunity to be able to play a concerto with orchestra, and I’m immensely pleased to have been asked to do so. It’s also an incredibly large investment of time to do the preparation for a performance, and thus, it would be lovely to have a larger audience. So, if you’re in the DC area, I invite you to come — it’s an all-Russian program (the other works are Rimsky-Korsakov’s “Russian Easter” Overture, and Tchaikovsky’s Nutcracker suite).
For me, this is a kind of mark of reclaiming my life outside of work. I used to play semi-professionally when I lived in the Bay Area, and a brief hiatus when I moved cross-country turned into a decade-long break, the last few years of which I have pretty much done nothing but work. During 2013, I’ve tried to find a more reasonable balance between working and other things; if you’re a client of mine, you know that I’ve been much stricter about the way I schedule travel, and pushing more things to my colleagues rather than allowing myself to be as heavily over committed.
My personality doesn’t really ever allow me to just veg out, and so the other things that I do, I tend to do pretty intensively, whether it’s the violin, building Lego sets, or cycling through various games (at the moment, I’m trying to dominate my PvP bracket in Marvel Puzzle Quest). So if you’d like to see me do something in a totally different context, please do come and hear the performance!
If you’re an investment banker or a vendor, and you’ve asked me in the last year, “Who should we buy?”, I’ve often pointed at enStratius (bought by Dell), ServiceMesh (bought by CSC last week), and Tier 3.
So now I’m three for three, because CenturyLink just bought Tier 3, continuing its acquisition activity. CenturyLink is a US-based carrier (pushed to prominence when they acquired Qwest in 2011). They got into the hosting business (in a meaningful way) when they bought Savvis in 2011; Savvis has long been a global leader in colocation and managed hosting. (It’s something of a pity that CenturyLink is in the midst of killing the Savvis brand, which has recently gotten a lot of press because of their partnership with VMware for vCHS, and is far better known outside the US than the CenturyLink brand, especially in the cloud and hosting space.)
Savvis has an existing cloud IaaS business and a very large number of offerings that have the “cloud” label, generally under the Symphony brand — I like to say of Savvis that they never seem to have a use case that they don’t think needs another product, rather than having a unified but flexible platform for everything.
The most significant of Savvis’s cloud offerings are Symphony VPDC (recently rebranded to Cloud Data Center), SavvisDirect, and their vCHS franchise. VPDC is a vCloud-Powered public cloud offering (although Savvis has done a more user-friendly portal than vCloud Director provides); Savvis often combines it with managed services in lightweight data center outsourcing deals. (Savvis also has private cloud offerings.) SavvisDirect is an offering developed together with CA, and is intended to be a pay-as-you-go, credit-card-based offering, targeted at small businesses (apparently intended to be competitive with AWS, but whose structure seems to illustrate a failure to grasp the appeal of cloud as opposed to just mass-market VPS).
Savvis is the first franchise partner for vCHS; back at the time of VMworld (September) they were offering indications that over the long term that they thought that vCHS would win and that Savvis only needed to build its own IaaS platform until vCHS could fully meet customer requirements. (But continuing to have their own platform is certainly necessary to hedge their bets.)
Now CenturyLink’s acquisition of Tier 3 seems to indicate that they’re going to more than hedge their bets. Tier 3 is an innovative small IaaS provider (with fingers in the PaaS world through a Cloud Foundry-based PaaS, and they added .NET support to Cloud Foundry as “Iron Foundry”). Their offering is vCloud-Powered public cloud IaaS, but they entirely hide vCloud Director under their own tooling (and it doesn’t seem vCloud-ish from either the front-end or the implementation of the back-end), and they have a pile of interesting additional capabilities built into their platform. They’ve made a hypervisor-neutral push, as well They’ve got a nice blend between capabilities that appeal to the traditional enterprise, and forward-looking capabilities that appeal to a DevOps orientation. Tier 3 has some blue-chip enterprise names as customers, and it has historically scored well on Gartner evaluations, and they’re strongly liked by our enterprise clients who have evaluated them — but people have always worried about their size. (Tier 3 has made it easy to white-label the offering, which has given them more success from its partners, like Peer 1.) The acquisition by CenturyLink neatly solves that size problem.
Indeed, CenturyLink seems to have placed a strong vote of confidence in their IaaS offering, because Tier 3 is being immediately rebranded, and immediately offered as the CenturyLink Cloud. (Current outstanding quotes for Symphony VPDC will likely be requoted, and new VPDC orders are unlikely to be taken.) CenturyLink will offer existing VPDC customers a free migration to the Tier 3 cloud (since it’s vCD-to-vCD, presumably this isn’t difficult, and it represents an upgrade in capabilities for customers). CenturyLink is also immediately discontinuing selling the SavvisDirect offering (although the existing platform will continue to run for the time being); customers will be directed to purchase the Tier 3 cloud instead. (Or, I should say, the CenturyLink Cloud, since the Tier 3 brand is being killed.) CenturyLink is also doing a broad international expansion of data center locations for this cloud.
CenturyLink has been surprisingly forward-thinking to date about the way the cloud converges infrastructure capabilities (including networking) and applications, and how application development and operations changes as a result. (They bought AppFog back in June to get a PaaS offering, too.) Their vision of how these things fit together is, I think, much more interesting than either AT&T or Verizon’s (or for that matter, any other major global carrier). I expect the Tier 3 acquisition to help accelerate their development of capabilities.
Savvis’s managed and professional services combined with the Tier 3 platform should provide them some immediate advantages in the cloud-enabled managed hosting and data center outsourcing markets. It’s more competition for the likes of CSC and IBM in this space, as well as providers like Verizon Terremark and Rackspace. I think the broad scope of the CenturyLink portfolio will mesh nicely not just with existing Tier 3 capabilities, but also capabilities that Tier 3 hasn’t had the resources to be able to develop previously.
Even though I believe that the hyperscale providers are likely to have the dominant market share in cloud IaaS, there’s still a decent market opportunity for everyone else, especially when the service is combined with managed and professional services. But I believe that managed and professional services need to change with the advent of the cloud — they need to become cloud-native and in many cases, DevOps-oriented. (Gartner clients only: see my research note, “Managed Service Providers Must Adapt to the Needs of DevOps-Oriented Customers“.) Tier 3 should be a good push for CenturyLink along this path, particularly since CenturyLink will make Tier 3’s Seattle offices the center of their cloud business, and they’re retaining Jared Wray (Tier 3’s founder) as their cloud CTO.
If you read Gartner research, you’ve probably noticed that we’ve started referring to something called “fast VM restart”. We consider it to be a critical infrastructure resiliency feature for many business application workloads.
Many applications are small. Really, really small. They take a fraction of a CPU core, and less than 1 GB of RAM. And they’ll never get any bigger. (They drive the big wins in server consolidation.) Most applications in mainstream businesses are like that. I often refer to these as “paperwork apps” — somebody fills out a form, that form is routed and processed, and eventually someone runs a report. Businesses have a zillion of these and continue to write more. When an organization says they have hundreds, or thousands, of apps, most of them are paperwork apps. They can be built by not especially bright or skilled programmers, and for resilience, they rely on the underlying infrastructure platform.
A couple things can happen to these kinds of paperwork apps in the future:
- They can be left on-premise to run as-is within an enterprise virtualization environment (that maybe eventually becomes private cloud-ish), relying on its infrastructure resilience.
- They can be migrated into a cloud IaaS environment, relying on it for infrastructure resilience.
- They can be migrated onto a PaaS, either on-premise or from a service provider, relying on it for resilience.
- They can be moved to business process management (BPM) platforms, either via on-premise deployment of a BPM suite, or a BPM PaaS, thereby making resilience the problem of the BPM software.
Note the thing that’s not on that list: Re-architecting the application for application-level resilience. That requires that your developers be skilled enough to do it, and for you to be able to run in a distributed fashion, which, due to the low level of resources consumed, isn’t economical.
Of the various scenarios above, the lift-and-shift onto cloud IaaS is a hugely likely one for many applications. But businesses want to be comfortable that the availability will be comparable to their existing on-premise infrastructure.
So what does infrastructure resilience mean? In short, it means minimal downtime due to either planned maintenance or a failure of the underlying hardware. Live migration is the most common modern technique used to mitigate downtime for planned maintenance that impacts the physical host. Fast VM restart is the most common technique used to mitigate downtime due to hardware failure.
Fast VM restart is built into nearly all modern hypervisors. It’s not magical — a shocking number of VMware customers believe that VM HA means they’ll instantly get a workload onto a happy healthy host from a failed host (i.e., they confuse live migration with VM HA). Fast VM restart is basically a technique to rapidly detect that a physical host has failed, and restart the VMs that were on that host, on some other host. It doesn’t necessarily need to be implemented at the virtualization level — you can just have monitoring that runs at a very short polling interval and that orchestrates a move-and-restart of VMs when it sees a host failure, for instance. (Of course, you need a storage and network architecture that makes this viable, too.)
Clearly, not all applications are happy when they get what is basically an unexpected reboot, but this is the level of infrastructure resiliency that works just fine for non-distributed applications. When customers babble about the reliability of their on-premise VMware-based infrastructure, this is pretty much what they mean. They think it has value. They’re willing to pay more for it. There’s no real reason why it shouldn’t be implemented by every cloud IaaS providers that intends to take general business applications, not just the VMware-based providers.
By the way: Lost in the news about live migration in Google Compute Engine has been an interesting new subtlety. I missed noticing this in my first read-through of the announcement, since it was phrased purely in the context of maintenance, and only a re-read while finishing up this blog post led me to wonder about general-case restart. And I haven’t seen this mentioned elsewhere:
The new GCE update also adds fast VM restart, which Google calls Automatic Restart. To quote the new documentation, “You can set up Google Compute Engine to automatically restart an instance if it is taken offline by a system event, such as a hardware failure or scheduled maintenance event, using the automaticRestart setting.” Answering a query from me, Google said that the restart time is dependent upon the type of failure, but that in most cases, it should be under three minutes.
So a gauntlet has been (subtly) thrown. This is one of the features that enterprises most want in an “enterprise-grade” cloud IaaS offering. Now Google has it. Will AWS, Microsoft, and others follow suit?
Google Compute Engine (GCE) has been a potential cloud-emperor contender in the shadows, and although GCE is still in beta, it’s been widely speculated that Google will likely be the third vendor in the trifecta of big cloud IaaS market-share leaders, along with Amazon Web Services (AWS) and Microsoft Windows Azure.
Few would doubt Google’s technology prowess, if it decides to commit itself to a business, though. A critical question has remained, though: Will Google be able to deliver technology capabilities that can be used by mere mortals in the enterprise, and market, sell, contract for, and deliver service in a way that such businesses can use? (Its ability to serve ephemeral large-scale compute workloads, and perhaps meet the needs of start-ups, is not in doubt.)
One of the most heartburn-inducing aspects of GCE has been its scheduled maintenance, To quote Google: “For scheduled zone maintenance windows, Google takes an entire zone offline for roughly two weeks to perform various, disruptive maintenance tasks.” Basically, Google has said, “Your data center will be going away for up to two weeks. Deal with it. You should be running in multiple zones anyway.”
Even most cloud-native start-ups aren’t capable of easily executing this way. Remember that most applications are architected to have their data locally, in the same zone as the compute. Without using Google’s PaaS capabilities (like Datastore), this means that the customer needs to move and/or replicate storage into another zone, which also increases their costs. Many applications aren’t large enough to warrant the complexity of a multi-zone implementation, either — not only business applications, but also smaller start-ups, mobile back-end implementations, and so forth.
So inherently, a hard-line stance on taking zones offline for maintenance, limited GCE’s market opportunity. Despite positioning this as a hard-line stance previously, Google has clearly changed its mind, introducing “transparent maintenance”. This is accomplished with a combination of live migration technology, and some innovations related to their implementation of physical data center maintenance. It’s an interesting indication of Google listening to prospects and customers and flexing to do something that has not been the Google Way.
Not only will Google’s addition of migration help data center maintenance, but more importantly, it will mitigate downtime related to host maintenance. Although AWS, for instance, tries to minimize host maintenance in order to avoid instance downtime or reboots, host maintenance is necessary — and it’s highly useful to have a technology that allows you to host maintenance without downtime for the instances, because this encourages you not to delay host maintenance (since you want to update the underlying host OS, hypervisor, etc.).
VMware-based providers almost always do live migration for host maintenance, since it’s one of the core compelling features of VMware. But AWS, and many competitors that model themselves after AWS, don’t. I hope that Google’s decision to add live migration into GCE pushes the rest of the market — and specifically AWS, which today generally sets the bar for customer expectations — into doing the same, because it’s a highly useful infrastructure resilience feature, and it’s important to customers.
More broadly, though, AWS hasn’t really had innovation competitors to date. Microsoft Azure is a real competitor, but other than in PaaS, they’ve largely been playing catch-up. Thanks to its extensive portfolio of internal technologies, Google has the potential ability to inject truly new capabilities into the market. Similar to what customers have seen with AWS — when AWS has been successful at introducing capabilities that many customers weren’t really even aware that they wanted — I expect Google is going to launch truly innovative capabilities that will turn into customer demands. It’s not that AWS is going to simply mount a competitive response — it will become a situation where customers ask for these capabilities, pushing AWS to respond. That should be excellent for the market.
It’s worth noting that the value of Google is not just GCE — it is Google Cloud Platform as a whole, including the PaaS elements. This is similarly true with Microsoft Azure. And although AWS seems to broadly bucketed as IaaS, in reality their capabilities overlap into the PaaS space. These vendors understand that the goal is the ability to develop and deliver business capaiblities more quickly — not to provide cheap infrastructure.
Capabilities equate lock-in, by the way, but historically, businesses have embraced lock-in whenever it results in more value delivered.
Near the beginning of July, IBM closed its acquisition of SoftLayer (which I discussed in a previous blog post). A little over three months have passed since then, and IBM has announced the addition of more than 1,500 customers, the elimination of SmartCloud Enterprise (SCE, IBM’s cloud IaaS offering), and went on the offensive against Amazon in an ad campaign (analyzed in my colleague Doug Toombs’s blog post). So what does this all mean for IBM’s prospects in cloud infrastructure?
IBM is unquestionably a strong brand with deep customer relationships — it exerts a magnetism for its customers that competitors like HP and Dell don’t come anywhere near to matching. Even with all of the weaknesses of the SCE offering, here at Gartner, we still saw customers choose the service simply because it was from IBM — even when the customers would openly acknowledge that they found the platform deficient and it didn’t really meet their needs.
In the months since the SoftLayer acquisition has closed, we’ve seen this “we use IBM for everything by preference” trend continue. It certainly helps immensely that SoftLayer is a more compelling solution than SCE, but customers continue to acknowledge that they don’t necessarily feel they’re buying the best solution or the best technology, but they are getting something that is good enough from a vendor that they trust. Moreover, they are getting it now; IBM has displayed astonishing agility and a level of aggression that I’ve never seen before. It’s impressive how quickly IBM has jump-started the pipeline this early into the acquisition, and IBM’s strengths in sales and marketing are giving SoftLayer inroads into a mid-market and enterprise customer base that it wasn’t able to target previously.
SoftLayer has always competed to some degree against AWS (philosophically, both companies have an intense focus on automation, and SoftLayer’s bare-metal architecture is optimal for certain types of use cases), and IBM SoftLayer will as well. In the IBM SoftLayer deals we’ve seen in the last couple of months, though, their competition isn’t really Amazon Web Services (AWS). AWS is often under consideration, but the real competitor is much more likely to be Rackspace — dedicated servers (possibly with a hybrid cloud model) and managed services.
IBM’s strategy is actually a distinctively different one from the other providers in the cloud infrastructure market. SoftLayer’s business is overwhelmingly dedicated hosting — mostly small-business customers with one or two bare-metal servers (a cost-sensitive, high-churn business), though they had some customers with large numbers of bare-metal servers (gaming, consumer-facing websites, and so forth). It also offers cloud IaaS, called CloudLayer, with by-and-hour VMs and small bare-metal servers, but this is a relatively small business (AWS has individual customers that are bigger than the entirety of CloudLayer). SoftLayer’s intellectual property is focused on being really, really good at quickly provisioning hardware in a fully automated way.
IBM has decided to do something highly unusual — to capitalize on SoftLayer’s bare-metal strengths, and to strongly downplay virtualization and the role of the cloud management platform (CMP). If you want a CMP — OpenStack, CloudStack, vCloud Director, etc. — on SoftLayer, there’s an easy way to install the software on bare metal. But if you want it updated, maintained, etc., you’ll either have to do it yourself, or you need to contract with IBM for managed services. If you do that, you’re not buying cloud IaaS; you’re renting hardware and CMP software, and building your own private cloud.
While IBM intends to expand the configuration options available in CloudLayer (and thus the number of hardware options available by the hour rather than by the month), their focus is upon the lower-level infrastructure constructs. This also means that they intend to remain neutral in the CMP wars. IBM’s outsourcing practice has historically been pretty happy to support whatever software you use, and the same largerly applies here — they’re offering managed services for the common CMPs, in whatever way you choose to configure them.
In other words, while IBM intends to continue its effort to incorporate OpenStack as a provisioning manager in its “Smarter Infrastructure” products (the division formerly known as Tivoli), they are not launching an OpenStack-based cloud IaaS, replacing the existing CloudLayer cloud IaaS platform, or the like.
IBM also intends to use SoftLayer as the underlying hardware platform for the application infrastructure components that will be in its Cloud Foundry-based framework for PaaS. It will depend on these components to compete against the higher-level constructs in the market (like Amazon’s RDS database-as-a-service).
IBM SoftLayer has a strong value proposition for certain use cases, but today their distinctive value proposition is a different one than AWS’s, but a very similar one to Rackspace’s (although I think Rackspace is going to embrace DevOps-centric managed services, while IBM seems more traditional in its approach). But IBM SoftLayer is still an infrastructure-centric story. I don’t know that they’re going to compete with the vision and speed of execution currently being displayed by AWS, Microsoft, and Google, but ultimately, those providers may not be IBM SoftLayer’s primary competitors.
Verizon already owns a cloud IaaS offering — in fact, it owns several. Terremark was an early AWS competitor with the Terremark Enterprise Cloud, a VMware-based offering that got strong enterprise traction during the early years of this market (and remains the second-most-common cloud provider amongst Gartner’s clients, with many companies using both AWS and Terremark), as well as a vCloud Express offering. Verizon entered the game later with Verizon Compute as a Service (now called Enterprise Cloud Managed Edition), also VMware-based. Since Verizon’s acquisition of Terremark, the company has continued to operate all the existing platforms, and intends to continue to do so for some time to come.
However, Verizon has had the ambition to be a bigger player in cloud; like many other carriers, it believes that network services are a commodity and a carrier needs to have stickier, value-added, higher-up-the-stack services in order to succeed in the future. However, Verizon also understood that it would have to build technology, not depend on other people’s technology, if it wanted to be a truly competitive global-class cloud player versus Amazon (and Microsoft, Google, etc.).
With that in mind, in 2011, Verizon went and made a manquisition — acquiring CloudSwitch not so much for its product (essentially hypervisor-within-a-hypervisor that allows workloads to be ported across cloud infrastructures using different technologies), as for its team. It gave them a directive to go build a cloud infrastructure platform with a global-class architecture that could run enterprise-class workloads, at global-class scale and at fully competitive price points.
Back in 2011, I conceived what I called the on-demand infrastructure fabric (see my blog post No World of Two Clouds, or, for Gartner clients, the research note, Market Trends: Public and Private Cloud Infrastructure Converge into On-Demand Infrastructure Fabrics) — essentially, a global-class infrastructure fabric with self-service selectable levels of availability, performance, and isolation. Verizon is the first company to have really built what I envisioned (though their project predates my note, and my vision was developed independently of any knowledge of what they were doing).
The Verizon Cloud architecture is actually very interesting, and, as far as I know, unique amongst cloud IaaS providers. It is almost purely a software-defined data center. Components are designed at a very low level — a custom hypervisor, SDN augmented with the use of NPUs, virtualized distributed storage. Verizon has generally tried to avoid using components for which they do not have source code. There are very few hardware components — there’s x86 servers, Arista switches, and commodity Flash storage (the platform is all-SSD). The network is flat, and high bandwidth is an expectation (Verizon is a carrier, after all). Oh, and there’s object-based storage, too (which I won’t discuss here).
The Verizon Cloud has a geographically distributed control plane designed for continuous availability, and it, along with the components, are supposed to be updatable without downtime (i.e., maintenance should not impact anything). It’s intended to provide fine-grained performance controls for the compute, network, and storage resource elements. It is also built to allow the user to select fault domains, allowing strong control of resource placement (such as “these two VMs cannot sit on the same compute hardware”); within a fault domain, workloads can be rebalanced in case of hardware failure, thus offering the kind of high availability that’s often touted in VMware-based clouds (including Terremark’s previous offerings). It is also intended to allow dynamic isolation of compute, storage, and networking components, allowing the creation of private clouds within a shared pool of hardware capacity.
The Verizon Cloud is intended to be as neutral as possible — the theory is that all VM hypervisors can run natively on Verizon’s hypervisor, many APIs can be supported (including its own API, the existing Terremark API, and the AWS, CloudStack, and OpenStack APIs), and there’ll be support for the various VM image formats. Initially, the supported hypervisor is a modified Xen. In other words, Verizon wants to take your workloads, wherever you’re running them now, and in whatever form you can export them.
It’s an enormously ambitious undertaking. It is, assuming it all works as promised, a technical triumph — it’s the kind of engineering you expect out of an organization like AWS or Google, or a software company like Microsoft or VMware, not a staid, slow-moving carrier (the mere fact that Verizon managed to launch this is a minor miracle unto itself). It is actually, in a way, what OpenStack might have aspired to be; the delta between this and the OpenStack architecture is, to me, full of sad might-have-beens of what OpenStack had the potential to be, but is not and is unlikely to become. (Then again, service providers have the advantage of engineering to a precisely-controlled environment. OpenStack, and for that matter, VMware, need to run on whatever junk the customer decides to use, instantly making the problem more complex.)
Unfortunately, the question at this stage is: Will anybody care?
Yes, I think this is an important development in the market, and the fact that Verizon is already a credible cloud player in the enterprise, with an entrenched base in the Terremark Enterprise Cloud, will help it. But in a world where developers control most IaaS purchasing, the bare-bones nature of the new Verizon offering means that it falls short of fulfilling the developer desire for greater productivity. In order to find a broader audience, Verizon will need to commit to developing all the richness of value-added capabilities that the market leaders will need — which likely means going after the PaaS market with the same degree of ambition, innovation, and investment, but certainly means committing to rapidly introducing complementing capabilities and bringing a rich ecosystem in the form of a software marketplace and other partnerships. Verizon needs to take advantage of its shiny new IaaS building blocks to rapidly introduce additional capabilities — much like Microsoft is now rapidly introducing new capabilities into Azure.
With that, assuming that this platform performs as designed, and Verizon can continue to treat Terremark’s cloud folks like they belong to a fast-moving start-up and not an ossified pipe provider, Verizon may have a shot at being one of the leaders in this market. Without that, the Verizon Cloud is likely to be relegated to a niche, just like every other provider whose capabilities stop at the level of offering infrastructure resources.
Massimo Re Ferre’ recently posted some thoughts as a follow-up to his talk at VMworld, about vCHS vs. AWS. That led to a Twitter exchange that made me think that I should highlight a viewpoint of mine:
I do not believe in a “world of two clouds”, where there are cloud IaaS offerings that are targeted at enterprise workloads, and there are cloud IaaS offerings that are targeted at cloud-native workloads — broadly, different clouds for applications designed with the assumption of infrastructure resilience, versus applications designed with the assumption that resilience must reside at the application layer.
Instead, I believe that the market leaders will offer a range of infrastructure resources. Some of those infrastructure resources will be more resilient, and will be more expensive. And customers will pay for the level of performance they receive. There’s no need to build two clouds; in fact, customers actively do not want two different clouds, since nobody really wants to shift between different clouds as you go through an application’s lifecycle, or for different tiers of an app, some of which might need greater infrastructure resilience and guaranteed performance.
I do not believe that application design patterns change to be fully cloud-native over time. First, enterprises have hundreds if not thousands of existing legacy applications that they will need to host. Second, enterprises continue to write non-cloud-native apps, because the typical app is small — it’s some kind of business process app (I call these “paperwork” apps, usually online forms with some workflow and reporting), and it runs on a tiny VM, has few users. It’s neither cost-effective to spend the developer time to make these apps resilient, nor cost-effective to distribute them. Putting them on decently resilient infrastructure is less expensive. Some of these apps should more logically be written on a business process management suite or PaaS (BPMS or bpmPaaS), or on a more general PaaS; that underlying BMPS/PaaS should hopefully functionally provide resilience, but that won’t deal with the existing legacy apps, so there’ll continue to be a need for resilient infrastructure.
When people talk about infrastructure resilience, they’re generally referring to compute resilience in particular — essentially, trying to protect the application from the impact of potential server hardware failure. VMware pioneered two technologies in this space — they call them “HA” (fast detection of physical host failure and automatic restart of the VMs that were running on that host, on some other host) and “vMotion” (live migration of VMs from one physical host to another). However, all the other major hypervisors have now incorporated these features. There’s absolutely no reason why a cloud IaaS provider like AWS, which doesn’t currently support these capabilities, can’t add them, and charge a premium for these VMs.
When people talk about performance consistency, they’re generally referring to storage and network performance. (Most cloud IaaS providers do not oversubscribe either CPU or RAM resources.) Predictable storage performance is a very difficult engineering problem. Companies like SolidFire are offering all-SSD storage to help accomplish this (since it reduces the variability of seek times), and we’re seeing gradual uptake of this technology into cloud IaaS providers. AWS has done “provisioned iops” (PIOPS), allowing customers to buy into a more predictable range of storage performance. There’s no reason why providers wouldn’t offer this kind of predictability for both storage and network — especially when they can charge extra for it.
Now, there are tons of service providers out there building to that world of two clouds — often rooted in the belief that IT operations will want one thing, and developers another, and they should build something totally different for both. This is almost certainly a losing strategy. Winning providers will satisfy both needs within a single cloud, offering architectural flexibility that allows developers to decide whether or not they want to build for application resiliency or infrastructure resiliency.
For more on this: I’ve covered this in detail in my research note, Market Trends: Public and Private Cloud Infrastructure Converge into On-Demand Infrastructure Fabrics (Gartner clients only).