FinOps can be a big waste of money

Of late, my colleagues and I have been talking to a lot of clients who want to build a “FinOps team”, which they seem to hope will wave magic wands and reduce their cloud IaaS+PaaS bill. I’m struck by how many clients I talk to don’t have cloud cost problems that are reasonably solvable with FinOps.

Bluntly: For many organizations, there is no reasonable ROI on FinOps (and certainly no sensible business case for building a FinOps team).

This doesn’t mean that the organization shouldn’t manage their cloud finances. It just means that they don’t need to manage their cloud finances in a way that’s meaningfully different from the way that they’ve historically managed IT spending in their on-premises data center. I’ll use the term “FinOps” colloquially here to indicate an organization taking an approach and processes for cloud financial management that are different from their established on-premises IT financial management. 

There are lots of common reasons why your organization might not need FinOps. For example:

  • You don’t use self-service.Your developers, app management engineers, data scientists, and other technical end-users do not have direct self-service access to cloud services. Instead, all cloud design and provisioning is done by a central infrastructure and operations (I&O) team — or alternatively, all cloud requests go through a service catalog and are manually reviewed and approved. Therefore, nothing happens in the cloud that’s outside of central I&O’s knowledge or control — likely allowing you to manage budgets like you did on-premises.
  • You have little to no variability in production: Your applications are allocated a static amount of infrastructure, and/or their usage is almost entirely predictable (for example, they autoscale up during the last week of the month, and then autoscale down after the close of the month). Therefore, your cloud bill for each application is essentially the same every month. You should nevertheless configure budget alerts in case something weird happens that makes usage spike, but that likely will be a one-time thing when the application is first deployed, perhaps with a once-a-year review.
  • You’re not spending much money in the cloud. If you’re not spending much money, even a significant percentage reduction in spend (which you could potentially get, for instance, by eliminating all  cloud dev/test VMs that aren’t used any longer and could simply be turned off) won’t be that many hard dollars of savings. Putting into place automation that automatically hibernates or deprovisions unused infrastructure may have a useful ROI, but playing manual whack-a-mole that involves a lot of people (whether in paperwork or actually mucking with infrastructure) almost certainly wastes more money in labor time than it saves in cloud costs.
  • You don’t have infrastructure-hungry applications. Enterprises often don’t have the voracious scale-out cloud-native applications that are common in digital-native companies, or they only have a small handful of those applications. You might be spending significant money in the cloud, but it’s spread across dozens, hundreds, or even thousands of small applications.  Therefore, even if you could cut the necessary capacity for a given application in half, it wouldn’t generate much in the way of monthly cost savings — likely not enough to justify the time of the people doing the work. Lots of enterprises run boring everyday “paperwork” apps on a VM or two (or these days, a container or two). A single-VM app often runs at 40% utilization at max, because of powers-of-two cloud VM sizing, so dropping a “T-shirt” size results in half the capacity and maybe 90% utilization, which many enterprises feel is uncomfortably tight. (And lots of organizations are slightly oversized across the board because they took the “safe” estimate of capacity needs from their cloud migration tools.)

Buying FinOps tools and allocating people to FinOps activities can cost you more than it saves.

Most people launch FinOps practices by purchasing a cloud cost optimization tool of some sort (i.e. a “FinOps tool”). Complicated FinOps processes and/or having a lot of teams and applications you have to corral within your cloud cost governance framework probably result in the genuine need to purchase a third-party FinOps tool — but those tools probably don’t represent a positive ROI until you’re spending more at least a million dollars a year. And then you have to remember that the percentage-of-cloud-spend pricing scheme of those tools can mean that you’re giving the FinOps-tool vendor a pile of money for service elements that they have no optimization capabilities for.

But in many cases, the cost of a tool will be dwarfed by the expense of the employees to do this work, especially in organizations who are making a misguided effort to hire a “FinOps team”. Not only does FinOps represent finance and sourcing overhead, but also cloud operations and engineering overhead — and, most of all, developer overhead (and overhead for any other technical team being asked to do cloud optimization work). If you go further and end up hiring a team that does performance engineering, those people are super rare and expensive.

In other words, being somewhat oversized in the cloud — or being somewhat inefficient in your application code — is a form of insidious creeping technical debt. But it’s the kind of technical debt that tends to linger, because when you look at the business case to actually go after that technical debt, there’s inadequate ROI to justify it. (Indeed, on-premises, people historically haven’t much cared. They throw hardware at the problem and run heavily oversized anyway. Nobody thinks about it because there was capital budget to buy the gear and once the gear was purchased, there wasn’t much reason to contemplate whether the money was efficiently used.)

Moreover, does your business actually want your highly-paid application development teams to chase performance issues in their code, or do they want them adding new features that will deliver new functionality to the business, saving you money elsewhere in your business processes and/or delivering something that will be compelling to customers, thus increasing your top-line revenue?

I certainly think it’s important for nearly all organizations to do some cloud financial management, which they will probably support with tooling. They’ve got to do the basics of cloud cost hygiene (preventing gross waste), budget alerts (to gain rapid awareness of accidents),  spend allocation (showback/chargeback) and discount-related planning (what’s necessary for commits, reserved instances, saving plans etc.) — but even there the effort needs to be proportional to the potential cost savings.

But full-ceremony FinOps, so to speak, is usually something better left for big money-pit applications where cloud engineer or developer effort can have a significant impact on cost — for organizations with substantial self-service and no culture of cost discipline, or for the big spenders where even moving the needle a little bit on things like basic hygiene can have a pretty large absolute dollar effort relative to the investment.

GreenOps for sustainability must parallel FinOps for cost

Cloud customers are trying to make meaningful sustainability decisions. To really reduce carbon impact (or other types of environmental impact), they need the transparency to understand the impact of their architectural decisions. Just like they need to be able to estimate the cost of a solution, they need to be able to estimate its environmental impact. They need to be able to get an estimate of what the “environmental bill” will be based on the region (and maybe zone), services, and service options they choose. To the extent possible, they then need to see what impact they’re actually generating based on actual utilization.

In other words, they need “GreenOps” the way that they need “FinOps” (using FinOps as a generic term for cloud financial management in this context). And because sustainability is not just carbon impact, they’ll probably eventually need to see a multidimensional set of metrics (or a way to create a custom metric that weights different things that are important to them, like water impact vs carbon impact).

Cloud providers have relatively decent cost tools — cost calculators that allow you to choose solution elements and estimate your bill, cost reporting of various sorts, and so forth. Similarly, the third-party FinOps tooling ecosystem provides good visibility and recommendations to customers.

We don’t really need totally new dashboards and tools for sustainability. What we really need is an extension to the existing cloud cost optimization tools (and the cost transparency and billing APIs that enable those tools) to display environmental impacts as well, so we can manage them alongside our costs. Indeed, most customers will want to make trade-offs between their environmental footprint and costs. For instance, are they potentially willing to pay more to lower their greenhouse gas emissions?

Of course, there are many ways to measure sustainability and many different types of impacts, and not all of them are well suited to this kind of granular breakdown — but drawing a GreenOps parallel to FinOps would help customers extend the tools and processes that they already use (or are developing) for cost management to the emerging need for sustainability management.

The road to cloud purgatory

It’s said that the road to hell is paved with good intentions. Well, in my opinion, the road to purgatory is paved with empty principles.

It’s certainly common enough in cloud adoption. Day after day, clients show up with cloud strategies that say things like, “We will use the cloud to be more innovative” and “We will be vigilant about costs and use the lowest-cost solutions” and “We will maximize our availability and resilience” and “We will be safe and secure in the cloud” and “We’re not going to get locked into our vendors”.

Some of these things are platitudes. Obviously, no one ever shows up with, “We will be careless and irresponsible in the cloud” or “Our implementations will be the shoddiest we can get away with” or “We’ll cheerfully waste money”. Principles that don’t help you make decisions aren’t very useful.

Principles like this are only interesting in the context that they represent a ranked set of priorities. When it comes down to “higher availability” versus “higher cost”, which are you going to choose? When you have to choose between a portable solution and a solution that is more innovative, how are you going to make that decision? (And if you think you’ve discovered a miraculous solution for cloud portability, some vendor has suckered you. Badly.)

My cocktail-napkin cloud strategy (Gartner paywall) research note asks you to make just a handful of decisions:

  • Your stance on what to do with new business solutions (i.e. new apps)
  • Your stance on cloud migration
  • Your stack-ranked priority for business agility, short-term costs and long-term TCO
  • Your appetites for risk, transformation, and business independence from central IT

It’s not unusual for us to see 20-page, 50-page, even 100-page cloud strategies that contain no clear decisions about any of those elements, because they are the things that are controversial — so they’ve simply been left out. So the strategy contains worthless platitudes, thoughtful governance is impossible, and actual cloud adoption stalls out on endless arguments that constantly relitigate the same conflicts.

If you’re constantly arguing about cloud-related decisions, or your lovely declaration of “cloud first!” seems to not actually result in any successful cloud adoption, take a hard look at your principles and the organization alignment around those principles and priorities. And make sure your principles can actually be pragmatically implemented.

Cloud self-service doesn’t need to invite the orc apocalypse

I spend quite a bit of time talking to clients about developer self-service, largely in the context of public cloud governance and cloud operations. There are still lots of infrastructure and operations (I&O) executives who instinctively cringe at the notion of developer self-service, as if self-service would open formerly well-defended gates onto a pristine plain of well-controlled infrastructure, and allow a horde of unwashed orcs to overrun the concrete landscape in a veritable explosion of Lego structures, dot-matrix printouts, Snickers wrappers and lost whiteboard marker caps… never to be clean and orderly again.

It doesn’t have to be that way.

Self-service — and more broadly, developer control over infrastructure — isn’t an all-or-nothing proposition. Responsibility can be divided across the application life cycle, so that you can get benefits from “You build it, you run it” without necessarily parachuting your developers into an untamed and unknown wilderness and wishing them luck in surviving because it’s not an Infrastructure & Operations (I&O) team problem any more.

So we ask, instead:

  1. Will developers design their own infrastructure?
  2. Will developers control their dev/test environments?
  3. How much autonomy will developers have in building production environments?
  4. How much autonomy will developers have for production deployments?
  5. To what extent are developers responsible for day-to-day production maintenance (patching, OS updates, infrastructure rightsizing, etc.)?
  6. To what extent are developers responsible for incident management?
  7. How much help will developers receive for the things they’re responsible for?

I talk to far too many IT leaders who say, “We can’t give developers cloud self-service because we’re not ready for You build it, you run it!” whereupon I need to gently but firmly remind them that it’s perfectly okay to allow your developers full self-service access to development and testing environments, and the ability to build infrastructure as code (IaC) templates for production, without making them fully responsible for production.

This is the subject of my new research note, “How to Empower Technical Teams Through Self-Service Public Cloud IaaS and PaaS“. (Gartner for Technical Professionals paywall)

This is a step along the way to a deeper exploration of finding the right balance between “Dev” and “Ops” in DevOps, which is an organization-specific thing. This is not just a cloud thing; it also impacts the structure of operations on-premises. Every discussion of SRE, platform ops, etc. ultimately revolves around the questions of autonomy, governance, and collaboration, and no two organizations are likely to arrive at the exact same balance. (And don’t get me started on how many orgs rename their I&O teams to SRE teams without actually implementing much if anything from the principles of SRE.)

The cloud budget overrun rainbow of flavors

Cloud budget overruns don’t have a singular cause. Instead, they come in a bright rainbow of jelly belly flavors (the Bertie Botts ones, especially, will combine into a non-mouthwatering delight). Each needs different forms of response.

Ungoverned costs. This is the black licorice of FinOps problems. The organization has no idea what it’s spending, really, much less where the money is going, other than the big bills (or often, many little credit card bills) that they pay each month. This requires basic cost hygiene: analyze your cloud bills, get a cost management tool into place and make it useful through some tagging or partitioning discipline.

Unanticipated usage. This is the sour watermelon flavor of cost overruns — deliciously sweet yet mouth-puckering. In this situation, the organization is the victim of its own cloud success. Cloud has been such a great thing for the organization that more and more unanticipated cloud projects are showing up, blowing out the original budget estimates for cloud resources. Those cloud projects are delivering business value and it doesn’t make sense to say no to them (and even if central IT says no, the cloud costs can usually be paid for out of a line-of-business budget). Nevertheless, it’s causing a lot of organizational angst because central IT or the sourcing team didn’t anticipate this spending. This organization needs to learn to shift its budgeting processes for the digital future, and cloud chargeback will help support future decision-making.

No commitments. This is the minty wrongness of Bertie Botts toothpaste. The organization could get discounts by using public discounting mechanisms for commits (like AWS Savings Plans and Azure Reserved Instances) as well making a contractual commitment for a negotiated discount. But because the organization feels like they can’t perfectly predict their use and aren’t sure if they’ll use all of what they’re using today, they commit to nothing, therefore ensuring that they spend grotesquely more than they could be. This is universally a terrible idea. Organizations that aren’t in early pilot stage have long-term production applications and some predictability of usage; commit to the stuff you know you’re not killing off.

Dev/test waste. This is the mundane bleah-ness of Bertie Botts earwax. Developers are provisioning the biggest things they can get away with (or at least being overaggressive in their estimates of what they need), there are lots of abandoned resources idling away, and dev/test infrastructure that isn’t used outside of business hours isn’t being suspended when unused. This is what cloud cost management tools are great at doing — identifying obvious waste so that it can be eliminated, largely by shutting it down or suspending it, preferably via automation.

Too much production headroom. This is the mild weirdness of the Bertie Botts grass flavor. Application teams haven’t implemented autoscaling for applications that can scale horizontally, or they’ve overestimated how much production headroom an application with variable usage needs (which may result in oversizing compute units, or being overly aggressive with autoscaling). This requires implementing autoscaling with some thoughtful tuning of parameters, and possibly a business value conversation on the cost/benefit tradeoff of having higher application performance on a consistent basis.

Wrongsizing production. This is the awful lingering terribleness of Bertie Botts vomit, whose taste you cannot get out of your mouth. Production environments are statically overprovisioned and therefore overly costly. On-prem, 30% utilization is common, but it’s all capex and as long as it’s within budget, no one really cares about the waste. But in the cloud, you pay for that excess resource monthly, forcing you to confront the ongoing cost of the waste.

However, anyone who tells you to “just” rightsize has never actually tried to do this in practice within an enterprise. The problem is that applications that scale vertically typically can’t be easily rightsized. It’s likely difficult-to-impossible to do automatically, due to complicated application installation. The application is fragile and may be mission-critical, so you are cautious about maintenance downtime. And the application team — the only people who really understand how this thing works — is likely busy with other priorities.

If this is your situation, your cloud cost management tool may cause you to cry hopeless tears, because you can see the waste but taking remediation actions is a complicated cross-functional war dance and delicate negotiation that leaves everyone wondering if it wouldn’t have been easier to just keep paying a larger bill.

Suboptimal design and implementation. The controversial popcorn flavor. Architects are sometimes cost-oblivious when they design cloud solutions. They may make bad design choices, or changes in application features and behavior over time may have turned out to make a design choice unexpectedly expensive. Developers may write poorly-performing code that consumes a lot of infrastructure resources, or code that makes excessive (and, cumulatively, expensive) calls to cloud services. Your cloud cost management tools are unlikely to be of any use for detecting these situations. This needs to be addressed through performance engineering, with attention paid to the business value of the time/effort/money necessary to do so — and for many organizations may require bringing in third-party expertise to diagnose the problems and offer recommendations.

Notably, the answer to most of these issues is not “implement a cloud cost management tool”. The challenges aren’t really as simple as a lot of vendors (and talking heads) make them out to be.

Don’t boil the ocean to create your cloud

Many of my client inquiries deal with the seemingly overwhelming complexity of maturing cloud adoption — especially with the current wave of pandemic-driven late adopters, who are frequently facing business directives to move fast but see only an immense tidal wave of insurmountably complex tasks.

A lot of my advice is focused on starting small — or at least tackling reasonably-scoped projects. The following is specifically applicable to IaaS / IaaS+PaaS:

Build a cloud center of excellence. You can start a CCOE with just a single person designated as a cloud architect. Standing up a CCOE is probably going to take you a year of incremental work, during which cloud adoption, especially pilot projects, can move along. You might have to go back and retroactively apply governance and good practices to some projects. That’s usually okay.

Start with one cloud. Don’t go multicloud from the start. Do one. Get good at it (or at least get a reasonable way into a successful implementation). Then add another. If there’s immediate business demand (with solid business-case justifications) for more than one, get an MSP to deal with the additional clouds.

Don’t build a complex governance and ops structure based on theory. Don’t delay adoption while you work out everything you think you’ll need to govern and manage it. If you’ve never used cloud before, the reality may be quite different than you have in your head. Run a sequence of increasingly complex pilot projects to gain practical experience while you do preparatory work in the background. Take the lessons learned and apply them to that work.

Don’t build massive RFPs to choose a provider. Almost all organizations are better off considering their strategic priorities and then matching a cloud provider to those priorities. (If priorities are bifurcated between running the legacy and building new digital capabilities, this might encourage two strategic providers, which is fine and commonplace.) Massive RFPs are a lot of work and are rarely optimal. (Government folks might have no choice, unfortunately.)

Don’t try to evaluate every service.  Hyperscale cloud providers have dozens upon dozens of services. You won’t use all of them. Don’t bother to evaluate all of them. If you think you might use a service in the future, and you want to compare that service across providers… well, by the time you get around to implementing it, all of the providers will have radically updated that service, so any work you do now will be functionally useless. Look just at the services that you are certain you will use immediately and in the very near (no more than one year) future. Validate a subset of services for use, and add new validations as needed later on.

Focus on thoughtful workload placement. Decide who your approved and preferred providers are, and build a workload placement policy. Look for “good technical fit” and not necessarily ideal technical fit; integration affinities and similar factors are more important. The time to do a detailed comparison of an individual service’s technical capabilities is when deciding workload placement, not during the RFP phase.

Accept the limits of cloud portability. Cloud providers don’t and will probably never offer commoditized services. Even when infrastructure resources seem superficially similar, there are still meaningful differences, and the management capabilities wrapped around those resources are highly differentiated. You’re buying into ecosystems that have the long-term stickiness of middleware and management software. Don’t waste time on single-pane-of-glass no-lock-in fantasies, no matter how glossily pretty the vendor marketing material is. And no, containers aren’t magic in this regard.

The messy dilemma of cloud operations

Responsibility for cloud operations is often a political football in enterprises. Sometimes nobody wants it; it’s a toxic hot potato that’s apparently coated in developer cooties. Sometimes everybody wants it, and some executives think that control over it are going to ensure their next promotion / a handsome bonus / attractiveness for their next job. Frequently, developers and the infrastructure & operations (I&O) orgs clash over it. Sometimes, CIOs decide to just stuff it into a Cloud Center of Excellence team which started out doing architecture and governance, and then finds itself saddled with everything else, too.

Lots of arguments are made for it to live in particular places and to be executed in various ways. There’s inevitably a clash between the “boring” stuff that is basically lifted-and-shifted and rarely changes, and the fast-moving agile stuff. And different approaches to IaaS, PaaS, and SaaS. And and and…

Well, the fact of the matter is that multiple people are probably right. You don’t actually want to take a one-size-fits-all approach. You want to fit operational approaches to your business needs. And you maybe even want to have specialized teams for each major hyperscale provider, even if you adopt some common approaches across a multicloud environment. (Azure vs. non-Azure, i.e. Azure vs. AWS, is a common split, often correlated closely to Windows-based application environments vs Linux-based application environments.)

Ideally, you’re going to be highly automated, agile, cloud-native, and collaborative between developers and operators (i.e. DevOps). But maybe not for everything (i.e. not all apps are under active development).

Plus, once you’ve chosen your basic operations approach (or approaches), you have to figure out how you’re going to handle cloud configuration, release engineering, and security responsibilities. (And all the upskilling necessary to do that well!)

That’s where people tend to really get hung up. How much responsibility can I realistically push to my development teams? How much responsibility do they want? How do I phase in new operational approaches over time? How do I hook this into existing CI/CD, agile, and DevOps initiatives?

There’s no one right answer. However, there’s one answer that is almost always wrong, and that’s splitting cloud operations across the I&O functional silos — i.e., the server team deals with your EC2 VMs, your NetApp storage admin deals with your Azure Blobs, your F5 specialist configures your Google Load Balancers, your firewall team fights with  your network team over who controls the VPC config (often settled, badly, by buying firewall virtual appliances), etc.

When that approach is taken, the admins almost always treat the cloud portals like they’re the latest pointy-clicky interface for a piece of hardware. This pretty much guarantees incompetence, lack of coordination, and gross inefficiency. It’s usually terrible at regardless of what scale you’re at. Unfortunately, it’s also the first thing that most people try (closely followed by massively overburdening some poor cloud architect with Absolutely Everything Cloud-Related.)

What works for most orgs: Some form of cloud platform operations, where cloud management is treated like a “product”.  It’s almost an internal cloud MSP approach, where the cloud platform ops team delivers a CMP suite, cloud-enabled CI/CD pipeline integrations, templates and automation, other cloud engineering, and where necessary, consultative assistance to  coders and to application management teams. That team is usually on call for incident response, but the first line for incidents is usually the NOC or the like, and the org’s usual incident management team.

But there are lots of options. Gartner clients: Want a methodical dissection of pros and cons; cloud engineering, operating, and administration tasks; job roles; coder responsibilities; security integration; and other issues? Read my new note, “Comparing Cloud Operations Approaches“, which looks at eleven core patterns along with guidance for choosing between them, andmaking a range of accompanying decisions.

Tiering self-service by user competence

A nontrivial chunk of my client conversations are centered on the topic of cloud IaaS/PaaS self-service, and how to deal with development teams (and other technical end-user teams, i.e. data scientists, researchers, hardware engineers, etc.) that use these services. These teams, and the individuals within those teams, often have different levels of competence with the clouds, operations, security, etc. but pretty much all of them want unfettered access.

Responsible governance requires appropriate guidelines (policies) and guardrails, and some managers and architects feel that there should be one universal policy, and everyone — from the highly competent digital business team, to the data scientists with a bit of ad-hoc infrastructure knowledge — should be treated identically for the sake of “fairness”. This tends to be a point of particular sensitivity if there are numerous application development teams with similar needs, but different levels of cloud competence. In these situations, applying a single approach is deadly — either for agility or your crisis-induced ulcer.

Creating a structured, tiered approach, with different levels of self-service and associated governance guidelines and guardrails, is the most flexible  approach. Furthermore, teams that deploy primarily using a CI/CD pipeline have different needs from teams working manually in the cloud provider portal, which in turn are different from teams that would benefit from having an easy-vend template that gets provisioned out of a ServiceNow request.

The degree to which each team can reasonably create its own configurations is related to the team’s competence with cloud solution architecture, cloud engineering, and cloud security. Not every person on the team may have a high level of competence; in fact, that will generally not be the case. However, the very least, for full self-service there needs to be at least one person with strong competencies in each of those areas, who has oversight responsibilities, acts an expert (provides assistance/mentorship within the team), and does any necessary code review.

If you use CI/CD, you also want automation of such review in your pipeline, that includes your infrastructure-as-code (IaC) and cloud configs, not just the app code; i.e. a tool like Concourse Labs). Even if your whole pipeline isn’t automated, review of IaC during the dev stage, and not just when it triggers a cloud security posture management tool (like Palo Alto’s Prisma Cloud or Turbot), whether in dev, test, or production.

Who determines “competence”? To avoid nasty internal politics, it’s best to set this standard objectively. Certifications are a reasonable approach, but if your org isn’t the sort that tends to pay for internal certifications or the external certifications (AWS/Azure Solution Architect, DevOps Engineer, Security Engineer, etc.) seem like too high a bar, you can develop an internal training course and certification. It’s not a bad idea for all of your coders (whether app developers, data scientists, etc.) that use the cloud to get some formal training on creating good and secure cloud configurations, anyway.

(For Gartner clients: I’m happy to have a deeper discussion in inquiry. And yes, a formal research note on this is currently going through our editing process and will be published soon.)

Building multicloud expertise

Building cloud expertise is hard. Building multicloud expertise is even harder. By “multicloud” in this context, I mean “adopting, within your organization, multiple cloud providers that do something similar” (such as adopting both AWS and Azure).

Integrated IaaS+PaaS providers are complex and differentiated entities, in both technical and business aspects. Add in their respective ecosystems — and the way that “multicloud” vendors, managed service providers (MSPs) etc. often deliver subtly (or obviously) different capabilities on different cloud providers — and you can basically end up with a multicloud katamari that picks up whatever capabilities it randomly rolls over. You can’t treat them like commodities (a topic I cover extensively in my research note on Managing Vendor Lock-In in Cloud IaaS).

For this reason, cloud-successful organizations that build a Cloud Center of Excellence (CCOE), or even just try to wrap their arms around some degree of formalized cloud operations and governance, almost always start by implementing a single cloud provider but plan for a multicloud future.  

Successfully multicloud organizations have cloud architects that deeply educate themselves on a single provider, and their cloud team initially builds tools and processes around a single provider — but the cloud architects and engineers also develop some basic understanding of at least one additional provider in order to be able to make more informed decisions. Some basic groundwork is laid for a multicloud future, often in the form of frameworks, but the actual initial implementation is single-cloud.

Governance and support for a second strategic cloud provider is added at a later date, and might  not necessarily be at the same level of depth as the primary strategic provider. Scenario-specific (use-case-specific or tactical) providers are handled on a case-by-case basis; the level of governance and support for such a provider may be quite limited, or may not be supported through central IT at all.

Individual cloud engineers may continue to have single-cloud rather than multicloud skills, especially because being highly expert in multiple cloud providers tend to boost market-rate salaries to levels that many enterprises and mid-market businesses consider untenable. (Forget using training-cost payback as a way to retain people; good cloud engineers can easily get a signing bonus more than large enough to deal with that.)

In other words: while more than 80% of organizations are multicloud, very few of them consider their multiple providers to be co-equal.

Refining the Cloud Center of Excellence

What sort of org structures work well for helping to drive successful cloud adoption? Every day I talk to businesses and public-sector entities about this topic. Some have been successful. Others are struggling. And the late-adopters are just starting out and want to get it right from the start.

Back in 2014, I started giving conference talks about an emerging industry best practice — the “Cloud Center of Excellence” (CCOE) concept. I published a research note at the start of 2019 distilling a whole bunch of advice on how to build a CCOE, and I’ve spent a significant chunk of the last year and a half talking to customers about it. Now I’ve revised that research, turning it into a hefty two-part note on How to Build a Cloud Center of Excellence: part 1 (organizational design) and part 2 (Year 1 tasks).

Gartner’s approach to the CCOE is fundamentally one that is rooted in the discipline of enterprise architecture and the role of EA in driving business success through the adoption of innovative technologies. We advocate a CCOE based on three core pillars — governance (cost management, risk management, etc.), brokerage (solution architecture and vendor management), and community (driving organizational collaboration, knowledge-sharing, and cloud best practices surfaced organically).

Note that it is vital for the CCOE to be focused on governance rather than on control. Organizations who remain focused on control are less likely to deliver effective self-service, or fully unlock key cloud benefits such as agility, flexibility and access to innovation. Indeed, IT organizations that attempt to tighten their grip on cloud control often face rebellion from the business that actually decreases the power of the CIO and the IT organization.

Also importantly, we do not think that the single-vendor CCOE approaches (which are currently heavily advocated by the professional services organizations of the hyperscalers) are the right long-term solution for most customers. A CCOE should ideally be vendor-neutral and span IaaS, PaaS, and SaaS in a multicloud world, with a focus on finding the right solutions to business problems (which may be cloud or noncloud). And a CCOE is not an IaaS/PaaS operations organization — cloud engineering/operations is a separate set of organizational decisions (I’ll have a research note out on that soon, too).

Please dive into the research (Gartner paywall) if you are interested in reading all the details. I have discussed this topic with literally thousands of clients over the last half-dozen years. If you’re a Gartner for Technical Professionals client, I’d be happy to talk to you about your own unique situation.