Near the beginning of July, IBM closed its acquisition of SoftLayer (which I discussed in a previous blog post). A little over three months have passed since then, and IBM has announced the addition of more than 1,500 customers, the elimination of SmartCloud Enterprise (SCE, IBM’s cloud IaaS offering), and went on the offensive against Amazon in an ad campaign (analyzed in my colleague Doug Toombs’s blog post). So what does this all mean for IBM’s prospects in cloud infrastructure?
IBM is unquestionably a strong brand with deep customer relationships — it exerts a magnetism for its customers that competitors like HP and Dell don’t come anywhere near to matching. Even with all of the weaknesses of the SCE offering, here at Gartner, we still saw customers choose the service simply because it was from IBM — even when the customers would openly acknowledge that they found the platform deficient and it didn’t really meet their needs.
In the months since the SoftLayer acquisition has closed, we’ve seen this “we use IBM for everything by preference” trend continue. It certainly helps immensely that SoftLayer is a more compelling solution than SCE, but customers continue to acknowledge that they don’t necessarily feel they’re buying the best solution or the best technology, but they are getting something that is good enough from a vendor that they trust. Moreover, they are getting it now; IBM has displayed astonishing agility and a level of aggression that I’ve never seen before. It’s impressive how quickly IBM has jump-started the pipeline this early into the acquisition, and IBM’s strengths in sales and marketing are giving SoftLayer inroads into a mid-market and enterprise customer base that it wasn’t able to target previously.
SoftLayer has always competed to some degree against AWS (philosophically, both companies have an intense focus on automation, and SoftLayer’s bare-metal architecture is optimal for certain types of use cases), and IBM SoftLayer will as well. In the IBM SoftLayer deals we’ve seen in the last couple of months, though, their competition isn’t really Amazon Web Services (AWS). AWS is often under consideration, but the real competitor is much more likely to be Rackspace — dedicated servers (possibly with a hybrid cloud model) and managed services.
IBM’s strategy is actually a distinctively different one from the other providers in the cloud infrastructure market. SoftLayer’s business is overwhelmingly dedicated hosting — mostly small-business customers with one or two bare-metal servers (a cost-sensitive, high-churn business), though they had some customers with large numbers of bare-metal servers (gaming, consumer-facing websites, and so forth). It also offers cloud IaaS, called CloudLayer, with by-and-hour VMs and small bare-metal servers, but this is a relatively small business (AWS has individual customers that are bigger than the entirety of CloudLayer). SoftLayer’s intellectual property is focused on being really, really good at quickly provisioning hardware in a fully automated way.
IBM has decided to do something highly unusual — to capitalize on SoftLayer’s bare-metal strengths, and to strongly downplay virtualization and the role of the cloud management platform (CMP). If you want a CMP — OpenStack, CloudStack, vCloud Director, etc. — on SoftLayer, there’s an easy way to install the software on bare metal. But if you want it updated, maintained, etc., you’ll either have to do it yourself, or you need to contract with IBM for managed services. If you do that, you’re not buying cloud IaaS; you’re renting hardware and CMP software, and building your own private cloud.
While IBM intends to expand the configuration options available in CloudLayer (and thus the number of hardware options available by the hour rather than by the month), their focus is upon the lower-level infrastructure constructs. This also means that they intend to remain neutral in the CMP wars. IBM’s outsourcing practice has historically been pretty happy to support whatever software you use, and the same largerly applies here — they’re offering managed services for the common CMPs, in whatever way you choose to configure them.
In other words, while IBM intends to continue its effort to incorporate OpenStack as a provisioning manager in its “Smarter Infrastructure” products (the division formerly known as Tivoli), they are not launching an OpenStack-based cloud IaaS, replacing the existing CloudLayer cloud IaaS platform, or the like.
IBM also intends to use SoftLayer as the underlying hardware platform for the application infrastructure components that will be in its Cloud Foundry-based framework for PaaS. It will depend on these components to compete against the higher-level constructs in the market (like Amazon’s RDS database-as-a-service).
IBM SoftLayer has a strong value proposition for certain use cases, but today their distinctive value proposition is a different one than AWS’s, but a very similar one to Rackspace’s (although I think Rackspace is going to embrace DevOps-centric managed services, while IBM seems more traditional in its approach). But IBM SoftLayer is still an infrastructure-centric story. I don’t know that they’re going to compete with the vision and speed of execution currently being displayed by AWS, Microsoft, and Google, but ultimately, those providers may not be IBM SoftLayer’s primary competitors.
Massimo Re Ferre’ recently posted some thoughts as a follow-up to his talk at VMworld, about vCHS vs. AWS. That led to a Twitter exchange that made me think that I should highlight a viewpoint of mine:
I do not believe in a “world of two clouds”, where there are cloud IaaS offerings that are targeted at enterprise workloads, and there are cloud IaaS offerings that are targeted at cloud-native workloads — broadly, different clouds for applications designed with the assumption of infrastructure resilience, versus applications designed with the assumption that resilience must reside at the application layer.
Instead, I believe that the market leaders will offer a range of infrastructure resources. Some of those infrastructure resources will be more resilient, and will be more expensive. And customers will pay for the level of performance they receive. There’s no need to build two clouds; in fact, customers actively do not want two different clouds, since nobody really wants to shift between different clouds as you go through an application’s lifecycle, or for different tiers of an app, some of which might need greater infrastructure resilience and guaranteed performance.
I do not believe that application design patterns change to be fully cloud-native over time. First, enterprises have hundreds if not thousands of existing legacy applications that they will need to host. Second, enterprises continue to write non-cloud-native apps, because the typical app is small — it’s some kind of business process app (I call these “paperwork” apps, usually online forms with some workflow and reporting), and it runs on a tiny VM, has few users. It’s neither cost-effective to spend the developer time to make these apps resilient, nor cost-effective to distribute them. Putting them on decently resilient infrastructure is less expensive. Some of these apps should more logically be written on a business process management suite or PaaS (BPMS or bpmPaaS), or on a more general PaaS; that underlying BMPS/PaaS should hopefully functionally provide resilience, but that won’t deal with the existing legacy apps, so there’ll continue to be a need for resilient infrastructure.
When people talk about infrastructure resilience, they’re generally referring to compute resilience in particular — essentially, trying to protect the application from the impact of potential server hardware failure. VMware pioneered two technologies in this space — they call them “HA” (fast detection of physical host failure and automatic restart of the VMs that were running on that host, on some other host) and “vMotion” (live migration of VMs from one physical host to another). However, all the other major hypervisors have now incorporated these features. There’s absolutely no reason why a cloud IaaS provider like AWS, which doesn’t currently support these capabilities, can’t add them, and charge a premium for these VMs.
When people talk about performance consistency, they’re generally referring to storage and network performance. (Most cloud IaaS providers do not oversubscribe either CPU or RAM resources.) Predictable storage performance is a very difficult engineering problem. Companies like SolidFire are offering all-SSD storage to help accomplish this (since it reduces the variability of seek times), and we’re seeing gradual uptake of this technology into cloud IaaS providers. AWS has done “provisioned iops” (PIOPS), allowing customers to buy into a more predictable range of storage performance. There’s no reason why providers wouldn’t offer this kind of predictability for both storage and network — especially when they can charge extra for it.
Now, there are tons of service providers out there building to that world of two clouds — often rooted in the belief that IT operations will want one thing, and developers another, and they should build something totally different for both. This is almost certainly a losing strategy. Winning providers will satisfy both needs within a single cloud, offering architectural flexibility that allows developers to decide whether or not they want to build for application resiliency or infrastructure resiliency.
For more on this: I’ve covered this in detail in my research note, Market Trends: Public and Private Cloud Infrastructure Converge into On-Demand Infrastructure Fabrics (Gartner clients only).
Bernard: “What skill or insight has allowed AWS to create an offering so superior to others in the market?”
AWS takes a comprehensive view of “what does the customer need”, looks at what customers (whether current customers or future target customers) are struggling with, and tries to address those things. AWS not only takes customer feedback seriously, but it also iterates at shocking speed. And it has been willing to invest massively in engineering. AWS’s engineering organization and the structure of the services themselves allows multiple, parallel teams to work on different aspects of AWS with minimal dependencies on the other teams. AWS had a head start, and with every passing year their engineering lead has grown larger. (Even though they have a significant burden of technical debt from having been first, they’ve also solved problems that competitors haven’t had to yet, due to their sheer scale.)
Many competitors haven’t had the willingness to invest the resources to compete, especially if they think of this business as one that’s primarily about getting a VM fast and that’s all. They’ve failed to understand that this is a software business, where feature velocity matters. You can sometimes manage to put together brilliant, hyper-productive small teams, but this is usually going to get you something that’s wonderful in the scope of what they’ve been able to build, but simply missing the additional capabilities that better-resourced competitors can manage (especially if a competitor can muster both resources and hyper-productivity). There are some awesome smaller companies in this space, though.
Bernard: “Plainly stated, why hasn’t a credible competitor emerged to challenge AWS?”
I think there’s a critical shift happening in the market right now. Three very dangerous competitors are just now entering the market — Microsoft, Google, and VMware. I think the real war for market share is just beginning.
For instance, consider the following, off the cuff, thoughts on those vendors. These are by no means anything more than quick thoughts and not a complete or balanced analysis. I have a forthcoming research note called “Rise of the Cloud IaaS Mega-Vendors” that focuses on this shift in the competitive landscape, and which will profile these four vendors in particular, so stay tuned for more. So, that said:
Microsoft has brand, deep customer relationships, deep technology entrenchment, and a useful story about how all of those pieces are going to fit together, along with a huge army of engineers, and a ton of money and the willingness to spend wherever it gains them a competitive advantage; its weakness is Microsoft’s broader issues as well as the Microsoft-centricity of its story (which is also its strength, of course). Microsoft is likely to expand the market, attracting new customers and use cases to IaaS — including blended PaaS models.
Google has brand, an outstanding engineering team, and unrivaled expertise at operating at scale; its weakness is Google’s usual challenges with traditional businesses (whatever you can say about AWS’s historical struggle with the enterprise, you can say about Google many times over, and it will probably take them at least as long as AWS did to work through that). Google’s share gain will mostly come at the expense of AWS’s base of HPC customers and young start-ups, but it will worm its way into the enterprise via interactive agencies that use its cloud platform; it should have a strong blended PaaS model.
VMware has brand, a strong relationship with IT operations folks, technology it can build on, and a hybrid cloud story to tell; whether or not its enterprise-class technology can scale to global-class clouds remains to be seen, though, along with whether or not it can get its traditional customer base to drive sufficient volume of cloud IaaS. It might expand the market, but it’s likely that much of its share gain will come at the expense of VMware-based “enterprise-class” service providers.
Obviously, it will take these providers some time to build share, and there are other market players who will be involved, including the other providers that are in the market today (and for all of you wondering “what about OpenStack”, I would classify that under the fates of the individual providers who use it). However, if I were to place my bets, it would be on those four at the top of market share, five years from now. They know that this is a software business. They know that innovative capabilities are vitally necessary. And they know that this has turned into a market fixated on developer productivity and business benefits. At least for now, that view is dominating the actual spending in this market.
You can certainly argue that another market outcome should have happened, that users should have chosen differently, or even that users are making poor decisions now that they’ll regret later. That’s an interesting intellectual debate, but at this point, Sisyphus’s rock is rolling rapidly downhill, so anyone who wants to push it back up is going to have an awfully difficult time not getting crushed.
Bernard Golden recently wrote a CIO.com blog post in response to my announcement of Gartner’s 2013 Magic Quadrant for Cloud IaaS. He raised a number of good questions that I thought it would be useful to address. This is part 1 of my response. (See part 2 for more.)
(Broadly, as a matter of Gartner policy, analysts do not debate Magic Quadrant results in public, and so I will note here that I’m talking about the market, and not the MQ itself.)
Bernard: “Why is there such a distance between AWS’s offering and everyone else’s?”
In the Magic Quadrant, we rate not only the offering itself in its current state, but also a whole host of other criteria — the roadmap, the vendor’s track record, marketing, sales, etc. (You can go check out the MQ document itself for those details.) You should read the AWS dot positioning as not just indicating a good offering, but also that AWS has generally built itself into a market juggernaut. (Of course, AWS is still far from perfect, and depending on your needs, other providers might be a better fit.)
But Bernard’s question can be rephrased as, “Why does AWS have so much greater market share than everyone else?”
Two years ago, I wrote two blog posts that are particularly relevant here:
- Common Service Provider Myths About Cloud Infrastructure
- In Cloud IaaS, Developers are the Face of Business Buyers
These posts were followed up wih two research notes (links are Gartner clients only):
- New Entrants to the Cloud IaaS Market Face Tough Competitive Challenges
- How Buyers Purchase Cloud IaaS
I have been beating the “please don’t have contempt for developers” drum for a while now. (I phrase it as “contempt” because it was often very clear that developers were seen as lesser, not real buyers doing real things — merely ignoring developers would have been one thing, but contempt is another.) But it’s taken until this past year before most of the “enterprise class” vendors acknowledged the legitimacy of the power that developers now hold.
Many service providers held tight to the view espoused by their traditional IT operations clientele: AWS was too dangerous, it didn’t have sufficient infrastructure availability, it didn’t perform sufficiently well or with sufficient consistency, it didn’t have enough security, it didn’t have enough manageability, it didn’t have enough governance, it wasn’t based on VMware — and it didn’t look very much like an enterprise’s data center architecture. The viewpoint was that IT operations would continue to control purchases, implementations would be relatively small-scale and would be built on traditional enterprise technologies, and that AWS would never get to the point that they’d satisfy traditional IT operations folks.
What they didn’t count on was the fact that developers, and the business management that they ultimately serve, were going to forge on ahead without them. Or that AWS would steadily improve its service and the way it did business, in order to meet the needs of the traditional enterprise. (My colleagues in GTP — the Gartner division that was Burton Group — do a yearly evaluation of AWS’s suitability for the enterprise, and each year, AWS gets steadily, materially better. Clients: see the latest.)
Today, AWS’s sheer market share speaks for itself. And it is definitely not just single developers with a VM or two, start-ups, or non-mission-critical stuff. Through the incredible amount of inquiry we take at Gartner, we know how cloud IaaS buyers think, source, succeed, and sometimes suffer. And every day at Gartner, we talk to multiple AWS customers (or prospects considering their options, though many have already bought something on the click-through agreement). Most are traditional enterprises of the G2000 variety (including some of the largest companies in the world), but over the last year, AWS has finally cracked the mid-market by working with systems integrator partners. The projected spend levels are clearly increasing dramatically, the use cases are extremely broad, the workloads increasingly have sensitive data and regulatory compliance concerns, and customers are increasingly thinking of AWS as a strategic vendor.
(Now, as my colleagues who cover the traditional data center like to point out, the spend levels are still trivial compared to what these customers are spending on the rest of their data center IT, but I think what’s critical here is the shift in thinking about where they’ll put their money in the future, and their desire to pick a strategic vendor despite how relatively early-stage the market is.)
But put another way — it is not just that AWS advanced its offering, but it convinced the market that this is what they wanted to buy (or at least that it was a better option than the other offerings), despite the sometimes strange offering constructs. They essentially created demand in a new type of buyer — and they effectively defined the category. And because they’re almost always first to market with a feature — or the first to make the market broadly aware of that capability — they force nearly all of their competitors into playing catch-up and me-too.
That doesn’t mean that the IT operations buyer isn’t important, or that there aren’t an array of needs that AWS does not address well. But the vast majority of the dollars spent on cloud IaaS are much more heavily influenced by developer desires than by IT operations concerns — and that means that market share currently favors the providers who appeal to development organizations. That’s an ongoing secular trend — business leaders are currently heavily growth-focused, and therefore demanding lots of applications delivered as quickly as possible, and are willing to spend money and take greater risks in order to obtain greater agility.
This also doesn’t mean that the non-developer-centric service providers aren’t important. Most of them have woken up to the new sourcing pattern, and are trying to respond. But many of them are also older, established organizations, and they can only move so quickly. They also have the comfort of their existing revenue streams, which allow them the luxury of not needing to move so quickly. Many have been able to treat cloud IaaS as an extension of their managed services business. But they’re now facing the threat of systems integrators like Cognizant and Capgemini entering this space, combining application development and application management with managed services on a strategic cloud IaaS provider’s platform — at the moment, normally AWS. Nothing is safe from the broader market shift towards cloud computing.
As always, every individual customer’s situation is different from another’s, and the right thing to do (or the safe, mainstream thing to do) evolves through the years. Gartner is appropriately cautionary when it discusses such things with clients. This is a good time to mention that Magic Quadrant placement is NEVER a good reason to include or exclude a vendor from a short list. You need to choose the vendor that’s right for your use case, and that might be a Niche Player, or even a vendor that’s not on the MQ at all — and even though AWS has the highest overall placement, they might be completely unsuited to your use case.
Of late, I’ve been talking to Amazon customers who are saying, you know, AWS gives us a ton of benefits, it makes a lot of things easy and fast that used to be hard, but in the end, we could do this ourselves, and probably do it at comparable cost or a cost that isn’t too much higher. These are customers that are at some reasonable scale — a take-back would involve dozens if not hundreds of physical server deployments — but aren’t so huge that the investment would be leveraged over, say, tens of thousands of servers.
Most people don’t choose cloud IaaS for lowered costs, unless they have very bursty or unpredictable workloads. Instead, they choose it for increased business agility, which to most people means “getting applications, and thus new business capabilities, more quickly”.
But there’s another key reason to not do it yourself: The war for talent.
The really bright, forward-thinking people in your organization — the people who you would ordinarily rely upon to deploy new technologies like cloud — are valuable. The fact that they’re usually well-paid is almost inconsequential compared to the fact that these are often the people who can drive differentiated, innovative business value for your organization, and they’re rare. Even if you have open headcount, finding those “A” players can be really, really tough, especially if you want a combination of cutting-edge technical skills with the personal traits — drive, follow-through, self-starting, thinking out of the box, communication skills, and so on — that make for top-tier engineers.
Just because you can do it yourself doesn’t mean that you should. Even if your engineers think they’re just as smart as Amazon’s engineers (which they might well be), and are chomping at the bit to prove it. If you can outsource a capability that doesn’t generate competitive advantage for you, then you can free your best people to work on the things that do generate competitive advantage. You can work on the next cool thing… and challenge your engineers to prove their brilliance by dreaming up something that hasn’t been done before, solving the challenges that deliver business value to your organization. Assuming, of course, that your culture provides an environment receptive to such innovation.
For months, there have been an abundance of rumors that Amazon was intending to enter the dynamic site acceleration market; it was the logical next step for its CloudFront CDN. Today, Amazon released a set of features oriented towards dynamic content, described in blog posts from Amazon’s Jeff Barr and Werner Vogels.
When CloudFront introduced custom origins (as opposed to the original CloudFront, which required you to use S3 as the origin), and dropped minimum TTLs down to zero, it effectively edged into the “whole site delivery” feature set that’s become mainstream for the major CDNs.
With this latest release, whole site delivery is much more of a reality — you can have multiple origins so you can mix static and dynamic content (which are often served from different hostnames, i.e., you might have images.mycompany.com serving your static content, but http://www.mycompany.com serving your dynamic content), and you’ve got pattern-matching rules that let you define what the cache behavior should be for content whose URL matches a particular pattern.
The “whole site delivery” feature set is important, because it hugely simplifies CDN configuration. Rather than having to go through your site and change its URL references to the CDN (long-time CDN watchers may remember that Akamai in the early days would have customers “Akamaize” their site using a tool that did these URL rewrites), the CDN is smart — it just goes to the origin and pulls things, and it can do so dynamically (so, for instance, you don’t have to explicitly publish to the CDN when you add a new page, image, etc. to your website). It gets you closer to simply being able to repoint the URL of your website to the CDN and having magic happen.
The dynamic site acceleration features — the actual network optimization features — that are being introduced are much more limited. They basically amount to TCP connection multiplexing, TCP connection peristency/pooling, and TCP window size optimization, much like Cotendo in its very first version. At this current stage, it’s not going to be seriously competing against Akamai’s DSA offering (or CDNetworks’s similar DWA offering), but it might have appeal against EdgeCast’s ADN offering.
However, I would expect that like everything else that Amazon releases, there will be frequent updates that introduce new features. The acceleration techniques are well known at this point, and Amazon would presumably logically add bidirectional (symmetric POP-to-POP) acceleration as the next big feature, in addition to implementing the common other optimizations (dynamic congestion control, TCP “FastRamp”, etc.).
What’s important here: CloudFront dynamic acceleration costs the same as static delivery. For US delivery, that starts at about $0.12/GB and goes down to below $0.02/GB for high volumes. That’s easily somewhere between one-half and one-tenth of the going rate for dynamic delivery. The delta is even greater if you look at a dynamic product like Akamai WAA (or its next generation, Terra Alta), where enterprise applications that might do all of a TB of delivery a month typically cost $6000 per app per month — whereas a TB of CloudFront delivery is $120. Akamai is pushing the envelope forward in feature development, and arguably those price points are so divergent that you’re talking about different markets, but low price points also expand a market to where lots of people can decide to do things, because it’s a totally different level of decision — to an enterprise, at that kind of price point, it might as well be free.
Give CloudFront another year of development, and there’s a high probability that it can become a seriously disruptive force in the dynamic acceleration market. The price points change the game, making it much more likely that companies, especially SaaS providers (many of whom use EC2, and AWS in general), who have been previously reluctant to adopt dynamic acceleration due to the cost, will simply get it as an easy add-on.
There is, by the way, a tremendous market opportunity out there for a company that delivers value-added services on top of CloudFront — which is to say, the professional services to help customers integrate with it, the ongoing expert technical support on a day to day basis, and a great user portal that provides industry-competitive reporting and analytics. CloudFront has reached the point where enterprises, large mainstream media companies, and other users of Akamai, Limelight, and Level 3 who feel they need ongoing support of complex implementations and a great toolset that helps them intelligently operate those CDN implementations, are genuinely interested in taking a serious look at CloudFront as an alternative, but there’s no company that I know of that provides the services and software that would bridge the gap between CloudFront and a traditional CDN implementation.
For those who have been wondering where I personally stand in the brouhaha over Amazon, Citrix, Eucalyptus, CloudStack, OpenStack, Rackspace, HP, and so on, along with the broader competitive market that includes VMware, Microsoft, and the Four Horsemen of management tools… I should state up-front that I hold the optimistic viewpoint that I want everyone to be successful as possible — service providers, commercial vendors, open-source projects, and the customers and users that depend upon them.
I feel that the more competent the competition in a market, the more that everyone in the ecosystem is motivated to do better, and the more customers benefit as a result. Customers benefit from better technology, lower costs, more responsive sales, and differentiated approaches to the market. Clearly, competition can hurt companies, but especially with emerging technology markets, competition often results in making the pie bigger for everyone, by expanding the range of customers that can be served — although yes, sometimes weaker competitors will be culled from the herd.
I believe that companies are best served by being the best they can be — you can target a competitor by responding on a tactical basis, and sometimes you want to, but for your optimal long-term success, you should strive to be great yourself. Obsessing over what your competitors are doing can easily distract companies from doing the right thing on a long-term strategic basis.
Dan Woods over on Forbes has written a blog post about questions around Amazon’s API strategy, and Jim Plamondon (Rackspace Developer Relations) has posted a comment on my blog about Amazon ecosystem zombiefication.
I’ve been thinking about the implications of Amazon API compatibility, and the degree to which it is or isn’t to Amazon’s advantage to encourage other people to build Amazon-compatible clouds.
I think it comes down to the following: If Amazon believes that they can innovate faster, drive lower costs, and deliver better service than all of their competitors that are using the same APIs (or, for that matter, enterprises who are using those same APIs), then it is to their advantage to encourage as many ways to “on-ramp” onto those APIs as possible, with the expectation that they will switch onto the superior Amazon platform over time.
But I would also argue that all this nattering about the basic semantics of provisioning bare resource elements is largely a waste of time for most people. None of the APIs for provisioning compute and storage (whether EC2/S3/EBS or their counterparts in other clouds) are complicated things at their core. They’re almost always wrappered with an abstraction layer, third-party library, or management tool. However, APIs may matter to people who are building clouds because they implicitly express the underlying conceptual framework of the system, though, and the richness of the API semantics constrain what can be expressed and therefore, what can be controlled via the API; the constraints of the Amazon APIs forces everyone else to express richer concepts in some other way.
But the battle will increasingly not be fought at this very basic level of ‘how do I get raw resources’. I recognize that building a cloud infrastructure platform at scale and with a lot of flexibility is a very difficult problem (although a simple and rigid one is not an especially difficult problem, as you can see from the zillion CMPs out in the market). But it’s not where value is ultimately created for users.
Value for users is ultimately created at the layers above the core infrastructure. Everyone has to get core infrastructure right, but the real question is: How quickly can you build value-added services, and how well does the adaptibility of your core infrastructure allow you to serve a broad range of use cases (or serve a narrow range of use cases in a fashion superior to everyone else) and to deliver new capabilities to your users?
There are two primary ecosystems developing in the world: VMware and Amazon. Other possibilities, like Microsoft and OpenStack, are completely secondary to those two. You can think of VMware as “cloud-out” and Amazon as “cloud-in” approaches.
In the VMware world, you move your data center (with its legacy applications) into the modern era with virtualization, and then you build a private cloud on top of that virtualized infrastructure; to get additional capacity, business agility, and so forth, you add external cloud IaaS, and hopefully do so with a VMware-virtualized provider (and, they hope, specifically a vCloud provider who has adopted the stack all the way up to vCloud Director).
In the Amazon world, you build and launch new applications directly onto cloud IaaS. Then, as you get to scale and a significant amount of steady-state capacity, you pull workloads back into your own data center, where you have Amazon-API-compatible infrastructure. Because you have a common API and set of tools across both, where to place your workloads is largely a matter of economics (assuming that you’re not using AWS capabilities beyond EC2, S3, and EBS). You can develop and test internally or externally, though if you intend to run production on AWS, you have to take its availability and performance characteristics into account when you do your application architecture. You might also adopt this strategy for disaster recovery.
While CloudStack has been an important CMP option for service providers — notably competing against the vCloud stack, OnApp, Hexagrid, and OpenStack — in the end, these providers are almost a decoration to the Amazon ecosystem. They’re mostly successful competing in places that Amazon doesn’t play — in countries where Amazon doesn’t have a data center, in the managed services / hosting space, in the hypervisor-neutral space (Amazon-style clouds built on top of VMware’s hypervisor, more specifically), and in a higher-performance, higher-availability market.
Where CloudStack has been more interesting is in its use to be a “cloud-in” platform for organizations who are using AWS in a significant fashion, and who want their own private cloud that’s compatible with it. Eucalyptus fills this niche as well, although Eucalyptus customers tend to be smaller and Eucalyptus tends to compete in the general private-cloud-builder CMP space targeted at enterprises — against the vCloud stack, Abiquo, HP CloudSystem, BMC Cloud Lifecycle Manager, CA’s 3Tera AppLogic, and so on. CloudStack tends to be used by bigger organizations; while it’s in the general CMP competitive space, enterprises that evaluate it are more likely to be also evaluating, say, Nimbula and OpenStack.
CloudStack has firmly aligned itself with the Amazon ecosystem. But OpenStack is an interesting case of an organization caught in the middle. Its service provider supporters are fundamentally interested in competing against AWS (far more so than with the VMware-based cloud providers, at least in terms of whatever service they’re building on top of OpenStack). Many of its vendor contributors are afraid of a VMware-centric world (especially as VMware moves from virtualizing compute to also virtualizing storage and networks), but just as importantly they’re afraid of a world in which AWS becomes the primary way that businesses buy infrastructure. It is to their advantage to have at least one additional successful widely-adopted CMP in the market, and at least one service provider successfully competing strongly against AWS. Yet AWS has established itself as a de facto standard for cloud APIs and for the way that a service “should” be designed. (This is why OpenStack has an aptly named “Nova Feature Parity Team” playing catch-up to AWS, after all, and why debates about the API continue in the OpenStack community.)
But make no mistake about it. This is not about scrappy free open-source upstarts trying to upset an established vendor ecosystem. This is a war between vendors. As Simon Wardley put it, beware of geeks bearing gifts. CloudStack is Citrix’s effort to take on VMware and enlist the rest of the vendor community in doing so. OpenStack is an effort on the part of multiple vendors — notably Rackspace and HP — to pool their engineering efforts in order to take on Amazon. There’s no altruism here, and it’s not coincidental that the committers to the projects have an explicit and direct commercial interest — they are people working full-time for vendors, contributing as employees of those vendors, and by and large not individuals contributing for fun.
So it really comes down to this: Who can innovate more quickly, and choose the right ways to innovate that will drive customer adoption?
Ladies and gentlemen, place your bets.
Eucalyptus began life as a university project to build a CMP that would create Amazon-API-compatible cloud infrastructure, but eventually turned into a commercial effort. However, like all other CMPs offering Amazon compatibility, Eucalyptus has always lived under the shadow of the threat that Amazon might someday try to enforce intellectual property rights related to its API.
With this partnership, Eucalyptus has formally licensed the Amazon API. There’s been a lot of speculation on what this means. My understanding is the following:
This is a non-exclusive technology partnership. Eucalyptus now has a formal license to build products that are compatible with the AWS APIs; at the moment, that’s EC2, S3, and EBS, but Eucalyptus can adopt the other APIs as well if they choose to. Amazon may enter into similar licensing agreements with others, enter into different sorts of partnerships, and so forth; this is a non-restrictive deal. Furthermore, this partnership is not a signal that Amazon is changing its stance towards other products/services with Amazon-compatible APIs, where it has to date adopted a laissez-faire attitude.
This is an API licensing deal, not a technology licensing deal. Amazon will provide Eucalyptus with API specifications, including related engineering specifications not provided in the public user-level documentation. However, Amazon will not be giving any technology away to Eucalyptus — this is not engineering assistance with the actual implementation. Eucalyptus will still need to do all of its own product engineering.
There is no coupling of Amazon and Eucalyptus’s development cycles. While Amazon will try to inform Eucalyptus of planned API changes so that Eucalyptus is able to release its own updates in a timely manner, Eucalyptus is on its own — if it can keep up with Amazon, fine, if it can’t, too bad. Eucalyptus is not obliged to remain Amazon-compatible, nor is Amazon obliged to ensure that it’s feasible for Eucalyptus to remain compatible.
Some people think that this deal with give Eucalyptus some much-needed life, since it has met with limited commercial interest, and its developer community has yet to really recover from the rifts created by a past licensing change.
I personally don’t agree. With people increasingly writing to libraries, or using third-party tools (RightScale, enStratus, etc.), developers tend to care less about what’s under the hood as long as their favorite tool supports it. Yes, Amazon’s API has become a de facto standard, but I haven’t seen Eucalyptus be the Amazon-compatible CMP of choice; instead, I see serious adopters choose CloudStack (Citrix, from the Cloud.com acquisition), and the vendors who want to be part of an open-source cloud project put their support primarily behind OpenStack. I’m not convinced that this licensing deal, however interesting, is going to significantly either shift buyer desires towards Eucalyptus, or improve their community support.
Randy Bias has blogged about Amazon mandating instance reboots for hundreds, perhaps thousands, of instances (Amazon’s term for VMs). Affected instances seem to be scheduled for reboots over the next couple of weeks. Speculation is that the reboots are to patch a recently-reported vulnerability in the Xen hypervisor, which is the virtualization technology that underlies Amazon’s EC2. The GigaOm story gives some links, and the CRN story discusses customer pain.
Maintenance reboots are not new on Amazon, and are detailed on Amazon’s documentation about scheduled maintenance. The required reboots this time are instance reboots, which are easily accomplished — just point-and-click to reboot on your own schedule rather than Amazon’s (although you cannot delay past the scheduled reboot). Importantly, instance reboots do not result in a change of IP address nor do they erase the data in instance storage (which is normally non-persistent).
For some customers, of course, a reboot represents a headache, and it results in several minutes of downtime for that instance. Also, since this is peak retail season, it is already a sensitive, heavy-traffic time for many businesses, so the timing of this widespread maintenance is problematic for many customers.
However, cloud IaaS isn’t magical. If these customers were using dedicated hosting, they would still be subject to mandated reboots for security patches — hosting providers generally offer some flexibility on scheduling such reboots, but not aa lot (and sometimes none at all if there’s an exploit in the wild). If these customers were using a provider that uses live migration technology (like VMotion on a VMware-virtualized cloud), they might be spared reboots for system reasons, but they might still be subject to reboots for mandated operating system patches.
Given that what’s underlying EC2 are ordinary physical servers running virtualization without a live migration technology in use, customers should reasonably expect that they will be subject to reboots — server-level (what Amazon calls a system reboot), as well as instance-level — and also anticipate that they may sometimes need to reboot for their own guest OS patches and the like (assuming that they don’t simply patch their AMIs and re-launch their instances, arguably a more “cloudy” way to approach this problem).
What makes this rolling scheduled maintenance remarkable is its sheer scale. Hosting providers typically have a few hundred customers and a few thousand servers. Mass-market VPS hosters have lots of VPS containers, but there’s a roughly 1:1 VPS:customer ratio and a small-business-centricity that doesn’t lead to this kind of hullabaloo. Amazon’s largest competitor is estimated to be around the 100,000 VM mark. Only the largest cloud IaaS providers have more than 2,000 VMs. Consequently, this involves a virtually unprecedented number of customers and mission-critical systems.
Amazon has actually been very good about not taking down its cloud customers for extended maintenance windows. (I can think of one major Amazon competitor that took down one whole data center for an eight-hour maintenence evidently involving a total outage this past weekend, and which regularly has long-downtime maintenance windows in general.) A reboot is an inconvenience, but if you are running production infrastructure, you should darn well think about how to handle the occasional reboot, including reboots that affect a significant percentage of your infrastructure, because reboots are not likely to go away in IaaS anytime soon.
To hammer on the point again: Cloud IaaS is not magical. It still requires management, and it still has some of the foibles of both physical servers and non-cloud virtualization. Being able to push a button and get infrastructure is nice, but the responsibility to manage that infrastructure doesn’t go away — it’s just that many cloud customers manage to delay the day of reckoning when the attention they haven’t paid to management comes back to bite them.
If you run infrastructure, regardless of whether it’s in your own data center, in hosting, or in cloud IaaS, you should have a plan for “what happens if I need to mass-reboot my servers?” because it is something that will happen. And add “what if I have to do that immediately?” to the list, because that is also something that will happen, because mass exploits and worms certainly have not gone away.
(Gartner clients only: Check out a note by my security colleagues, “Address Concentration Risk in Public Cloud Deployments and Shared-Service Delivery Models to Avoid Unacceptable Losses“.)