Author Archives: Lydia Leong
Clients have recently been asking a lot more questions about the comparative resilience of cloud providers.
Identity services are a particular point of concern (for instance, the Azure AD outage of October 1st and Google Cloud IAM outage of March 26th) since when identity is down, the customer can’t access the cloud provider’s control plane (and it may impact service use in general) — plus there’s generally no way for the customer to work around such issues.
The good news is, hyperscale cloud providers do a pretty good job of being robust. However, the risk of smaller, more hosting-like providers can be much higher — and there are notable differences between the hyperscalers, too.
Operations folks know: Everything breaks. Physical stuff fails, software is buggy, and people screw up (a lot). A provider can try its best to reduce the number of failures, limit the “blast radius” of a problem, limit the possibility of “cascading failures”, and find ways to mitigate the impact on users. But you can’t avoid failure entirely. Systems that are resilient recover quickly from failure.
If you chop off the head of a hydra, it grows back — quickly. We can think about five key factors — heads of the hydra — that influence the robustness, resilience, and observed (“real world”) availability of cloud services:
- Physical design: The design of physical things, such as the data center and the hardware used to deliver services.
- Logical (software) design: The design of non-physical things, especially software — all aspects of the service architecture that is not related to a physical element.
- Implementation quality: The robustness of the actual implementation, encompassing implementation skill, care and meticulousness, and the effectiveness of quality-assurance (QA) efforts.
- Deployment processes: The rollout of service changes is the single largest cause of operational failures in cloud services. The quality of these processes, the automation used in the processes, and the degree to which humans are given latitude to use good judgment (or poor judgment) thus have a material impact on availability.
- Operational processes: Other operational processes, such as monitoring, incident management — and, most importantly, problem management — impact the cloud provider’s ability to react quickly to problems, mitigate issues, and ensure that the root causes of incidents are addressed. Both proactive and reactive maintenance efforts can have an impact on availability.
A sixth factor, Transparency, isn’t directly related to keeping the hydra alive, but matters to customers as they plan for their own application architectures and risk management — contributing to customer resilience. Transparency includes making architectural information to customers, as well as delivering outage-related visibility and insight to customers. Customers need real-world info — like current and historical outage reports and the root-cause-analysis port-mortems that offer insight into what went wrong and why (and what the provider is doing about it).
When you think about cloud service resilience (or the resilience of your own systems), think about it in terms of those factors. Don’t think about it like you think about on-premises systems, where people often think primarily about hardware failures or a fire in the data center. Rather, you’re dealing with systems where software issues are almost always the root cause. Physical robustness still matters, but the other four factors are largely about software.
“It is a lovely place, my house,” said the Queen. “I am sure you would like it. There are whole rooms full of Turkish Delight, and what’s more, I have no children of my own. I want a nice boy whom I could bring up as a Prince and who would be King of Narnia when I am gone. While he was Prince he would wear a gold crown and eat Turkish Delight all day long; and you are much the cleverest and handsomest young man I’ve ever met. I think I would like to make you the Prince—some day, when you bring the others to visit me.” — The White Witch (C.S. Lewis; The Lion, The Witch, and the Wardrobe)
When most people read the Narnia novels as children, they have no idea what Turkish Delight is. Its obscurity in recent decades has allowed everyone to imagine it as an entirely wonderful substance, carrying all their hopes and dreams of the perfect candy.
So, too, do people pour all of their business hopes and dreams into a nebulously-defined future of “digital transformation”.
Because the cloud is such a key enabling technology for digital business, I have plenty of discussions with clients who have been promised grand “digital transformation” outcomes by cloud providers and cloud MSPs. But it certainly not a phenomenon limited to the cloud. Hardware vendors and ISVs, outsourcers, consultancies, etc. are all selling this dream. While I can think of vendors who are more guilty of this than others, it’s a cross-IT-industry phenomenon.
Beware all digital transformation promises. Especially the ones where the vendor promises to partner with you to change the future of your industry or reinvent/revolutionize/disrupt X (where X is what your industry does).
I’ve quietly watched a string of broken transformation promises over the last few years, gently privately warning clients in inquiry conversations that you generally can’t trust these sorts of vendor promises. These behaviors have become much more prominent recently, though. And a colleague recently told me about a conversation that seemed like just a bridge too far: a large tech vendor promising to partner with a small Midwestern industrial manufacturer (tech laggards not doing anything exciting) to create transformative products, as part of a sales negotiation. (This vendor had not previously exhibited such behavior, so it was doubly startling.)
Clients come to us with tales of vendors who, in the course of sales discussions, promises to partner with them — possibly even dangling the possibility of a joint venture — to launch a transformational digital business, revolutionize the future of their industry, or the like. (Note that this is distinct from companies that sell transformation consulting. They promise to help you figure out the future, not form a business partnership to create that future — i.e. McKinsey, Deloitte, etc.)
Usually, neither the customer nor the vendor have a concrete idea of what that looks like. Usually, the vendor refuses to put this partnership notion in writing as a formal contract. On the rare occasion that there is a contract, it is pretty vague, does not oblige the vendor to put forth any business ideas, and allows the vendor to refuse any business idea and investment. In other words, it has zero teeth. Because it’s so open-ended, the customer can fill the void with all their Turkish Delight dreams.
Moreover, the vendor may sometimes dangle samples of transformation-oriented services and consulting during the sales process. The customer gobbles down these sweet nuggets, and then stares mournfully at the empty box of transformation candy. For the promise of more, they’ll cheerfully betray their enterprise procurement standards, while the sourcing managers stand on the sidelines frantically waving contract-related warnings.
Listen to your sourcing managers when they warn you that the proposed “partnership” is a fiction. The White Witch probably doesn’t have your best interests at heart. Good digital transformation promises — ones that are likely to actually be kept — have concrete outcomes. They specify what the partnership will be developing, together with timelines, budgets, and the legal entity (such as a JV) that will be used to deliver the products/services. Or they specify the specific consulting services that will be provided — workshops, deliverables from those workshops, work-for-hire agreements with specific costs (and discounts, if applicable), and so forth.
Without concrete contractual outcomes, the vendor can vanish the candy into thin air with no repercussions. Sure, in a concrete transformation proposal, the end result will probably not be your Turkish Delight dreams. It might resemble a bowl of ordinary M&Ms. Or maybe a tasty grab-bag of Lindt truffles. (You’d have to get particularly lucky for it get much beyond the realm of grocery-store candy, though.) But you’re much more likely to actually get a good outcome.
Off-hand, I can think of one public example where a prominent “change the industry” vendor partnership with an enterprise, seems to have resulted in a credible product: Microsoft’s Connected Vehicle Platform. There, Microsoft signed a deal with a collection of automakers, each of whom had specific outcomes they wished to achieve — outcomes which could be realistically achieved in a reasonable amount of time, and representing industry advancement but not anything truly revolutionary. Microsoft built upon those individual projects to deliver a platform which would move the industry forward, which was announced with a clear mission and a timeframe for launch. Sure, it didn’t “change the future of cars”, but it brought tangible benefits to the customers.
Vendors often try to sell to who you hope to be, rather than who you are now. Your aspirations aren’t bad. Just make sure that your aspirations are well-defined and there’s a realistic roadmap to achieve them. Hope is not a strategy. The vendor may have little incentive not to promise everything you could dream of, in order to get you to sign a large purchase agreement.
Many of my client inquiries deal with the seemingly overwhelming complexity of maturing cloud adoption — especially with the current wave of pandemic-driven late adopters, who are frequently facing business directives to move fast but see only an immense tidal wave of insurmountably complex tasks.
A lot of my advice is focused on starting small — or at least tackling reasonably-scoped projects. The following is specifically applicable to IaaS / IaaS+PaaS:
Build a cloud center of excellence. You can start a CCOE with just a single person designated as a cloud architect. Standing up a CCOE is probably going to take you a year of incremental work, during which cloud adoption, especially pilot projects, can move along. You might have to go back and retroactively apply governance and good practices to some projects. That’s usually okay.
Start with one cloud. Don’t go multicloud from the start. Do one. Get good at it (or at least get a reasonable way into a successful implementation). Then add another. If there’s immediate business demand (with solid business-case justifications) for more than one, get an MSP to deal with the additional clouds.
Don’t build a complex governance and ops structure based on theory. Don’t delay adoption while you work out everything you think you’ll need to govern and manage it. If you’ve never used cloud before, the reality may be quite different than you have in your head. Run a sequence of increasingly complex pilot projects to gain practical experience while you do preparatory work in the background. Take the lessons learned and apply them to that work.
Don’t build massive RFPs to choose a provider. Almost all organizations are better off considering their strategic priorities and then matching a cloud provider to those priorities. (If priorities are bifurcated between running the legacy and building new digital capabilities, this might encourage two strategic providers, which is fine and commonplace.) Massive RFPs are a lot of work and are rarely optimal. (Government folks might have no choice, unfortunately.)
Don’t try to evaluate every service. Hyperscale cloud providers have dozens upon dozens of services. You won’t use all of them. Don’t bother to evaluate all of them. If you think you might use a service in the future, and you want to compare that service across providers… well, by the time you get around to implementing it, all of the providers will have radically updated that service, so any work you do now will be functionally useless. Look just at the services that you are certain you will use immediately and in the very near (no more than one year) future. Validate a subset of services for use, and add new validations as needed later on.
Focus on thoughtful workload placement. Decide who your approved and preferred providers are, and build a workload placement policy. Look for “good technical fit” and not necessarily ideal technical fit; integration affinities and similar factors are more important. The time to do a detailed comparison of an individual service’s technical capabilities is when deciding workload placement, not during the RFP phase.
Accept the limits of cloud portability. Cloud providers don’t and will probably never offer commoditized services. Even when infrastructure resources seem superficially similar, there are still meaningful differences, and the management capabilities wrapped around those resources are highly differentiated. You’re buying into ecosystems that have the long-term stickiness of middleware and management software. Don’t waste time on single-pane-of-glass no-lock-in fantasies, no matter how glossily pretty the vendor marketing material is. And no, containers aren’t magic in this regard.
Links are to Gartner research and are paywalled.
Pondering the care and feeding of your multicloud gelatinous cube. (Which engulfs everything in its path, and digests everything organic.)
Most organizations end up multicloud, rather than intending to be multicloud in a deliberate and structured way. So typical tales go like this: The org started doing digital business-related new applications on AWS and now AWS has become the center of gravity for all new cloud-native apps and cloud-related skills. Then the org decided to migrate “boring” LOB Windows-based COTS to the cloud for cost-savings, and lifted-and-shifted them onto Azure (thereby not actually saving money, but that’s a post for another day). Now the org has a data science team that thinks that GCP is unbearably sexy. And there’s a floating island out there of Oracle business applications where OCI is being contemplated. And don’t forget about the division in China, that hosts on Alibaba Cloud…
Multicloud is inevitable in almost all organizations. Cloud IaaS+PaaS spans such a wide swathe of IT functions that it’s impractical and unrealistic to assume that the organization will be single-vendor over the long term. Just like the enterprise tends to have at least three of everything (if not ten of everything), the enterprise is similarly not going to resist the temptation of being multicloud, even if it’s complex and challenging to manage, and significantly increases management costs. It is a rare organization that both has diverse business needs, and can exercise the discipline to use a single provider.
Despite recognizing the giant ooze that we see squelching our way, along with our unavoidable doom, there are things we can do to prepare, govern, and ensure that we retain some of our sanity.
For starters, we can actively choose our multicloud strategy and stance. We can classify providers into tiers, decide what providers are approved for use and under what circumstances, and decide what providers are preferred and/or strategic.
We can then determine the level of support that the organization is going to have for each tier — decide, for instance, that we’ll provide full governance and operations for our primary strategic provider, a lighter-weight approach that leans on an MSP to support our secondary strategic provider, and less support (or no support beyond basic risk management) for other providers.
After that, we can build an explicit workload placement policy that has an algorithm that guides application owners/architects in deciding where particular applications live, based on integration affinities, good technical fit, etc.
Note that cost-based provider selection and cost-based long-term workload placement are both terrible ideas. This is a constant fight between cloud architects and procurement managers. It is rooted in the erroneous idea that IaaS is a commodity, and that provider pricing advantages are long-term rather than short-lived. Using cost-based placement often leads to higher long-term TCO, not to mention a grand mess with data gravity and thus data management, and fragile application integrations.
See my new research note, “Comparing Cloud Workload Placement Strategies” (Gartner paywall) for a guide to multicloud IaaS / IaaS+PaaS strategies (including when you should pursue a single-cloud approach). In a few weeks, you’ll see the follow-up doc “Designing a Cloud Workload Placement Policy” publish, which provides a guide to writing such policies, with an analysis of different placement factors and their priorities.
Responsibility for cloud operations is often a political football in enterprises. Sometimes nobody wants it; it’s a toxic hot potato that’s apparently coated in developer cooties. Sometimes everybody wants it, and some executives think that control over it are going to ensure their next promotion / a handsome bonus / attractiveness for their next job. Frequently, developers and the infrastructure & operations (I&O) orgs clash over it. Sometimes, CIOs decide to just stuff it into a Cloud Center of Excellence team which started out doing architecture and governance, and then finds itself saddled with everything else, too.
Lots of arguments are made for it to live in particular places and to be executed in various ways. There’s inevitably a clash between the “boring” stuff that is basically lifted-and-shifted and rarely changes, and the fast-moving agile stuff. And different approaches to IaaS, PaaS, and SaaS. And and and…
Well, the fact of the matter is that multiple people are probably right. You don’t actually want to take a one-size-fits-all approach. You want to fit operational approaches to your business needs. And you maybe even want to have specialized teams for each major hyperscale provider, even if you adopt some common approaches across a multicloud environment. (Azure vs. non-Azure, i.e. Azure vs. AWS, is a common split, often correlated closely to Windows-based application environments vs Linux-based application environments.)
Ideally, you’re going to be highly automated, agile, cloud-native, and collaborative between developers and operators (i.e. DevOps). But maybe not for everything (i.e. not all apps are under active development).
Plus, once you’ve chosen your basic operations approach (or approaches), you have to figure out how you’re going to handle cloud configuration, release engineering, and security responsibilities. (And all the upskilling necessary to do that well!)
That’s where people tend to really get hung up. How much responsibility can I realistically push to my development teams? How much responsibility do they want? How do I phase in new operational approaches over time? How do I hook this into existing CI/CD, agile, and DevOps initiatives?
There’s no one right answer. However, there’s one answer that is almost always wrong, and that’s splitting cloud operations across the I&O functional silos — i.e., the server team deals with your EC2 VMs, your NetApp storage admin deals with your Azure Blobs, your F5 specialist configures your Google Load Balancers, your firewall team fights with your network team over who controls the VPC config (often settled, badly, by buying firewall virtual appliances), etc.
When that approach is taken, the admins almost always treat the cloud portals like they’re the latest pointy-clicky interface for a piece of hardware. This pretty much guarantees incompetence, lack of coordination, and gross inefficiency. It’s usually terrible at regardless of what scale you’re at. Unfortunately, it’s also the first thing that most people try (closely followed by massively overburdening some poor cloud architect with Absolutely Everything Cloud-Related.)
What works for most orgs: Some form of cloud platform operations, where cloud management is treated like a “product”. It’s almost an internal cloud MSP approach, where the cloud platform ops team delivers a CMP suite, cloud-enabled CI/CD pipeline integrations, templates and automation, other cloud engineering, and where necessary, consultative assistance to coders and to application management teams. That team is usually on call for incident response, but the first line for incidents is usually the NOC or the like, and the org’s usual incident management team.
But there are lots of options. Gartner clients: Want a methodical dissection of pros and cons; cloud engineering, operating, and administration tasks; job roles; coder responsibilities; security integration; and other issues? Read my new note, “Comparing Cloud Operations Approaches“, which looks at eleven core patterns along with guidance for choosing between them, andmaking a range of accompanying decisions.
A nontrivial chunk of my client conversations are centered on the topic of cloud IaaS/PaaS self-service, and how to deal with development teams (and other technical end-user teams, i.e. data scientists, researchers, hardware engineers, etc.) that use these services. These teams, and the individuals within those teams, often have different levels of competence with the clouds, operations, security, etc. but pretty much all of them want unfettered access.
Responsible governance requires appropriate guidelines (policies) and guardrails, and some managers and architects feel that there should be one universal policy, and everyone — from the highly competent digital business team, to the data scientists with a bit of ad-hoc infrastructure knowledge — should be treated identically for the sake of “fairness”. This tends to be a point of particular sensitivity if there are numerous application development teams with similar needs, but different levels of cloud competence. In these situations, applying a single approach is deadly — either for agility or your crisis-induced ulcer.
Creating a structured, tiered approach, with different levels of self-service and associated governance guidelines and guardrails, is the most flexible approach. Furthermore, teams that deploy primarily using a CI/CD pipeline have different needs from teams working manually in the cloud provider portal, which in turn are different from teams that would benefit from having an easy-vend template that gets provisioned out of a ServiceNow request.
The degree to which each team can reasonably create its own configurations is related to the team’s competence with cloud solution architecture, cloud engineering, and cloud security. Not every person on the team may have a high level of competence; in fact, that will generally not be the case. However, the very least, for full self-service there needs to be at least one person with strong competencies in each of those areas, who has oversight responsibilities, acts an expert (provides assistance/mentorship within the team), and does any necessary code review.
If you use CI/CD, you also want automation of such review in your pipeline, that includes your infrastructure-as-code (IaC) and cloud configs, not just the app code; i.e. a tool like Concourse Labs). Even if your whole pipeline isn’t automated, review of IaC during the dev stage, and not just when it triggers a cloud security posture management tool (like Palo Alto’s Prisma Cloud or Turbot), whether in dev, test, or production.
Who determines “competence”? To avoid nasty internal politics, it’s best to set this standard objectively. Certifications are a reasonable approach, but if your org isn’t the sort that tends to pay for internal certifications or the external certifications (AWS/Azure Solution Architect, DevOps Engineer, Security Engineer, etc.) seem like too high a bar, you can develop an internal training course and certification. It’s not a bad idea for all of your coders (whether app developers, data scientists, etc.) that use the cloud to get some formal training on creating good and secure cloud configurations, anyway.
(For Gartner clients: I’m happy to have a deeper discussion in inquiry. And yes, a formal research note on this is currently going through our editing process and will be published soon.)
(Confused by the title of this post? Read this brief anecdote.)
The myth of cloud repatriation refuses to die, and a good chunk of the problem is that users (and poll respondents) use “repatriation” is a wild array of ways, but non-cloud vendors want you to believe that “repatriation” means enterprises packing up all their stuff in the cloud and moving it back into their internal data centers — which occurs so infrequently that it’s like a sasquatch sighting.
A non-comprehensive list of the ways that clients use the term “repatriation” that have little to nothing to do with what non-cloud vendors (or “hybrid”) would like you to believe:
Outsourcing takeback. The origin of the term comes from orgs that are coming back from traditional IT outsourcing. However, we also hear cloud architects say they are “repatriating” when they gradually take back management of cloud workloads from a cloud MSP; the workloads stay in the cloud, though.
Migration pause. Some migrations to IaaS/IaaS+PaaS do not go well. This is often the result of choosing a low-quality MSP for migration assistance, or rethinking the wisdom of a lift-and-shift. Orgs will pause, switch MSPs and/or switch migration approaches (usually to lift-and-optimize), and then resume. Some workloads might be temporarily returned on-premise while this occurs.
SaaS portfolio rationalization. Sprawling adoption of SaaS, at the individual, team, department or business-unit level, can result in one or more SaaS applications being replaced with other, official, corporate SaaS (for instance, replacing individual use of Dropbox with an org-wide Google Drive implementation as part of G-Suite). Sometimes, the org might choose to build on-premises functionality instead (for instance, replacing ad-hoc SaaS analytics with an on-prem data warehouse and enterprise BI solution). This is overwhelmingly the most common form of “cloud repatriation”.
Development in the cloud, production on premises. While the dev/prod split of environments is much less common than it used to be, some organizations still develop in cloud IaaS and then run the app in an on-prem data center in production. Orgs like this will sometimes say they “repatriate” the apps for production.
The Oops. Sometimes organizations attempt to put an application in the cloud and it Just Doesn’t Go Well. Sometimes the workload isn’t a good match for cloud services in general. Sometimes the workload is just a bad match for the particular provider chosen. Sometimes they make a bad integrator choice, or their internal cloud skills are inadequate to the task. Whatever it is, people might hit the “abort” button and either rethink and retry in the cloud, or give up and put it on premises (either until they can put together a better plan, or for the long term).
Of course, there are the sasquatch sightings, too, like the Dropbox migration from AWS (also see the five-year followup), but those stories rarely represent enterprise-comparable use cases. If you’re one of the largest purchasers of storage on the planet, and you want custom hardware, absolutely, DIY makes sense. (And Dropbox continues to do some things on AWS.)
Customers also engage in broader strategic application portfolio rationalizations that sometimes result in groups of applications being shifted around, based on changing needs. While the broader movement is towards the cloud, applications do sometimes come back on-premises, often to align to data gravity considerations for application and data integration.
None of these things are in any way equivalent to the notion that there’s a broad or even common movement of workloads from the cloud back on-premises, though, especially for those customers who have migrated entire data centers or the vast majority of their IT estate to the cloud.
(Updated with research: In my note for Gartner clients, “Moving Beyond the Myth of Repatriation: How to Handle Cloud Projects Failures”, I provide detailed guidance on why cloud projects fail, how to reduce the risks of such projects, and how — or if — to rescue troubled cloud projects.)
Building cloud expertise is hard. Building multicloud expertise is even harder. By “multicloud” in this context, I mean “adopting, within your organization, multiple cloud providers that do something similar” (such as adopting both AWS and Azure).
Integrated IaaS+PaaS providers are complex and differentiated entities, in both technical and business aspects. Add in their respective ecosystems — and the way that “multicloud” vendors, managed service providers (MSPs) etc. often deliver subtly (or obviously) different capabilities on different cloud providers — and you can basically end up with a multicloud katamari that picks up whatever capabilities it randomly rolls over. You can’t treat them like commodities (a topic I cover extensively in my research note on Managing Vendor Lock-In in Cloud IaaS).
For this reason, cloud-successful organizations that build a Cloud Center of Excellence (CCOE), or even just try to wrap their arms around some degree of formalized cloud operations and governance, almost always start by implementing a single cloud provider but plan for a multicloud future.
Successfully multicloud organizations have cloud architects that deeply educate themselves on a single provider, and their cloud team initially builds tools and processes around a single provider — but the cloud architects and engineers also develop some basic understanding of at least one additional provider in order to be able to make more informed decisions. Some basic groundwork is laid for a multicloud future, often in the form of frameworks, but the actual initial implementation is single-cloud.
Governance and support for a second strategic cloud provider is added at a later date, and might not necessarily be at the same level of depth as the primary strategic provider. Scenario-specific (use-case-specific or tactical) providers are handled on a case-by-case basis; the level of governance and support for such a provider may be quite limited, or may not be supported through central IT at all.
Individual cloud engineers may continue to have single-cloud rather than multicloud skills, especially because being highly expert in multiple cloud providers tend to boost market-rate salaries to levels that many enterprises and mid-market businesses consider untenable. (Forget using training-cost payback as a way to retain people; good cloud engineers can easily get a signing bonus more than large enough to deal with that.)
In other words: while more than 80% of organizations are multicloud, very few of them consider their multiple providers to be co-equal.
What sort of org structures work well for helping to drive successful cloud adoption? Every day I talk to businesses and public-sector entities about this topic. Some have been successful. Others are struggling. And the late-adopters are just starting out and want to get it right from the start.
Back in 2014, I started giving conference talks about an emerging industry best practice — the “Cloud Center of Excellence” (CCOE) concept. I published a research note at the start of 2019 distilling a whole bunch of advice on how to build a CCOE, and I’ve spent a significant chunk of the last year and a half talking to customers about it. Now I’ve revised that research, turning it into a hefty two-part note on How to Build a Cloud Center of Excellence: part 1 (organizational design) and part 2 (Year 1 tasks).
Gartner’s approach to the CCOE is fundamentally one that is rooted in the discipline of enterprise architecture and the role of EA in driving business success through the adoption of innovative technologies. We advocate a CCOE based on three core pillars — governance (cost management, risk management, etc.), brokerage (solution architecture and vendor management), and community (driving organizational collaboration, knowledge-sharing, and cloud best practices surfaced organically).
Note that it is vital for the CCOE to be focused on governance rather than on control. Organizations who remain focused on control are less likely to deliver effective self-service, or fully unlock key cloud benefits such as agility, flexibility and access to innovation. Indeed, IT organizations that attempt to tighten their grip on cloud control often face rebellion from the business that actually decreases the power of the CIO and the IT organization.
Also importantly, we do not think that the single-vendor CCOE approaches (which are currently heavily advocated by the professional services organizations of the hyperscalers) are the right long-term solution for most customers. A CCOE should ideally be vendor-neutral and span IaaS, PaaS, and SaaS in a multicloud world, with a focus on finding the right solutions to business problems (which may be cloud or noncloud). And a CCOE is not an IaaS/PaaS operations organization — cloud engineering/operations is a separate set of organizational decisions (I’ll have a research note out on that soon, too).
Please dive into the research (Gartner paywall) if you are interested in reading all the details. I have discussed this topic with literally thousands of clients over the last half-dozen years. If you’re a Gartner for Technical Professionals client, I’d be happy to talk to you about your own unique situation.
Preface added 20 November 2020: This post received a lot more attention than I expected. I must reiterate that it is not in any way an endorsement. Indeed, sparkly pink unicorns are, by their nature, fanciful. Caution must be exercised, as sparkly pink glitter can conceal deficiencies in the equine body.
Digging into my archive of past predictions… In a research note on the convergence of public and private cloud, published almost exactly eight years ago in July 2012, I predicted that the cloud IaaS market would eventually deliver a service that delivered a full public cloud experience as if it were private cloud — at the customer’s choice of data center, in a fully single-tenant fashion.
Since that time, there have been many attempts to introduce public-cloud-consistent private cloud offerings. Gartner now has a term, “distributed cloud”, to refer to the on-premises and edge services delivered by public cloud providers. AWS Outposts deliver, as a service, a subset of AWS’s incredibly rich product porfolio. Azure Stack (now Azure Stack Hub) delivers, as software, a set of “Azure-consistent” capabilities (meaning you can transfer your scripts, tooling, conceptual models, etc., but it only supports a core set of mostly infrastructure capabilities). Various cloud MSPs, notably Avanade, will deliver Azure Stack as a managed service. And folks like IBM and Google want you to take their container platform software to facilitate a hybrid IT model.
But no one has previously delivered what I think is what customers really want:
- Location of the customer’s choice
- Single-tenant; no other customer shares the hardware/service; data guaranteed to stay within the environment
- Isolated control plane and private self-service interfaces (portal, API endpoints); no tethering or dependence on the public cloud control plane, or Internet exposure of the self-service interfaces
- Delivered as a service with the same pricing model as the public cloud services; not significantly more expensive than public cloud as long as minimum commitment is met
- All of the provider’s services (IaaS+PaaS), identical to the way that they are exposed in the provider’s public cloud regions
Why do customers want that? Because customers like everything the public cloud has to offer — all the things, IaaS and PaaS — but there are still plenty of customers who want it on-premises and dedicated to them. They might need it somewhere that public cloud regions generally don’t live and may never live (small countries, small cities, edge locations, etc.), they might have regulatory requirements they believe they can only meet through isolation, they may have security (even “national security”) requirements that demand isolation, or they may have concerns about the potential to be cut off from the rest of the world (as the result of sanctions, for instance). And because when customers describe what they want, they inevitably ask for sparkly pink unicorns, they also want all that to be as cheap as a multi-tenant solution.
And now it’s here, and given that it’s 2020… the sparkly pink unicorn comes from Oracle. Specifically, the world now has Oracle Dedicated Regions Cloud @ Customer. (Which I’m going to shorthand as OCI-DR, even though you can buy Oracle SaaS hosted on this infrastructure) OCI’s region model, unlike its competitors, has always been all-services-in-all-regions, so the OCI-DR model continues that consistency.
In an OCI-DR deal, the customer basically provides colo (either their own data center or a third party colo) to Oracle, and Oracle delivers the same SLAs as it does in OCI public cloud. The commit is very modest — it’s $6 million a year, for a 3-year minimum, per OCI-DR Availability Zone (a region can have multiple AZs, and you can also buy multiple regions). There are plenty of cloud customers that easily meet that threshold. (The typical deal size we see for AWS contracts at Gartner is in the $5 to $15 million/year range, on 3+ year commitments.) And the pricing model and actual price for OCI-DR services is identical to OCI’s public regions.
The one common pink sparkly desire that OCI doesn’t meet is the ability to use your own hardware, which can help customers address capex vs. opex desires, may have perceived cost advantages, and may address secure supply chain requirements. OCI-DR uses some Oracle custom hardware, and the hardware is bundled as part of the service.
I predict that this will raise OCI’s profile as an alternative to the big hyperscalers, among enterprise customers and even among digital-native customers. Prior to today’s announcement, I’d already talked to Gartner clients who had been seriously engaged in sales discussions on OCI-DR; Oracle has quietly been actively engaged in selling this for some time. Oracle has made significant strides (surprisingly so) in expanding OCI’s capabilities over this last year, so when they say “all services” that’s now a pretty significant portfolio — likely enough for more customers to give OCI a serious look and decide whether access to private regions is worth dealing with the drawbacks (OCI’s more limited ecosystem and third-party tool support probably first and foremost).
As always, I’m happy to talk to Gartner clients who are interested in a deeper discussion. We’ve recently finished our Solution Scorecards (an in-depth assessment of 270 IaaS+PaaS capabilities), including our new assessment of OCI. The scores are summarized in a publicly-reprinted document. The full scorecard has been published, and the publicly-available summary says, “OCI’s overall solution score is 62 out of 100, making it a scenario-specific option for technical professionals responsible for cloud production deployments.”