Monthly Archives: March 2023

FinOps can be a big waste of money

Of late, my colleagues and I have been talking to a lot of clients who want to build a “FinOps team”, which they seem to hope will wave magic wands and reduce their cloud IaaS+PaaS bill. I’m struck by how many clients I talk to don’t have cloud cost problems that are reasonably solvable with FinOps.

Bluntly: For many organizations, there is no reasonable ROI on FinOps (and certainly no sensible business case for building a FinOps team).

This doesn’t mean that the organization shouldn’t manage their cloud finances. It just means that they don’t need to manage their cloud finances in a way that’s meaningfully different from the way that they’ve historically managed IT spending in their on-premises data center. I’ll use the term “FinOps” colloquially here to indicate an organization taking an approach and processes for cloud financial management that are different from their established on-premises IT financial management. 

There are lots of common reasons why your organization might not need FinOps. For example:

  • You don’t use self-service.Your developers, app management engineers, data scientists, and other technical end-users do not have direct self-service access to cloud services. Instead, all cloud design and provisioning is done by a central infrastructure and operations (I&O) team — or alternatively, all cloud requests go through a service catalog and are manually reviewed and approved. Therefore, nothing happens in the cloud that’s outside of central I&O’s knowledge or control — likely allowing you to manage budgets like you did on-premises.
  • You have little to no variability in production: Your applications are allocated a static amount of infrastructure, and/or their usage is almost entirely predictable (for example, they autoscale up during the last week of the month, and then autoscale down after the close of the month). Therefore, your cloud bill for each application is essentially the same every month. You should nevertheless configure budget alerts in case something weird happens that makes usage spike, but that likely will be a one-time thing when the application is first deployed, perhaps with a once-a-year review.
  • You’re not spending much money in the cloud. If you’re not spending much money, even a significant percentage reduction in spend (which you could potentially get, for instance, by eliminating all  cloud dev/test VMs that aren’t used any longer and could simply be turned off) won’t be that many hard dollars of savings. Putting into place automation that automatically hibernates or deprovisions unused infrastructure may have a useful ROI, but playing manual whack-a-mole that involves a lot of people (whether in paperwork or actually mucking with infrastructure) almost certainly wastes more money in labor time than it saves in cloud costs.
  • You don’t have infrastructure-hungry applications. Enterprises often don’t have the voracious scale-out cloud-native applications that are common in digital-native companies, or they only have a small handful of those applications. You might be spending significant money in the cloud, but it’s spread across dozens, hundreds, or even thousands of small applications.  Therefore, even if you could cut the necessary capacity for a given application in half, it wouldn’t generate much in the way of monthly cost savings — likely not enough to justify the time of the people doing the work. Lots of enterprises run boring everyday “paperwork” apps on a VM or two (or these days, a container or two). A single-VM app often runs at 40% utilization at max, because of powers-of-two cloud VM sizing, so dropping a “T-shirt” size results in half the capacity and maybe 90% utilization, which many enterprises feel is uncomfortably tight. (And lots of organizations are slightly oversized across the board because they took the “safe” estimate of capacity needs from their cloud migration tools.)

Buying FinOps tools and allocating people to FinOps activities can cost you more than it saves.

Most people launch FinOps practices by purchasing a cloud cost optimization tool of some sort (i.e. a “FinOps tool”). Complicated FinOps processes and/or having a lot of teams and applications you have to corral within your cloud cost governance framework probably result in the genuine need to purchase a third-party FinOps tool — but those tools probably don’t represent a positive ROI until you’re spending more at least a million dollars a year. And then you have to remember that the percentage-of-cloud-spend pricing scheme of those tools can mean that you’re giving the FinOps-tool vendor a pile of money for service elements that they have no optimization capabilities for.

But in many cases, the cost of a tool will be dwarfed by the expense of the employees to do this work, especially in organizations who are making a misguided effort to hire a “FinOps team”. Not only does FinOps represent finance and sourcing overhead, but also cloud operations and engineering overhead — and, most of all, developer overhead (and overhead for any other technical team being asked to do cloud optimization work). If you go further and end up hiring a team that does performance engineering, those people are super rare and expensive.

In other words, being somewhat oversized in the cloud — or being somewhat inefficient in your application code — is a form of insidious creeping technical debt. But it’s the kind of technical debt that tends to linger, because when you look at the business case to actually go after that technical debt, there’s inadequate ROI to justify it. (Indeed, on-premises, people historically haven’t much cared. They throw hardware at the problem and run heavily oversized anyway. Nobody thinks about it because there was capital budget to buy the gear and once the gear was purchased, there wasn’t much reason to contemplate whether the money was efficiently used.)

Moreover, does your business actually want your highly-paid application development teams to chase performance issues in their code, or do they want them adding new features that will deliver new functionality to the business, saving you money elsewhere in your business processes and/or delivering something that will be compelling to customers, thus increasing your top-line revenue?

I certainly think it’s important for nearly all organizations to do some cloud financial management, which they will probably support with tooling. They’ve got to do the basics of cloud cost hygiene (preventing gross waste), budget alerts (to gain rapid awareness of accidents),  spend allocation (showback/chargeback) and discount-related planning (what’s necessary for commits, reserved instances, saving plans etc.) — but even there the effort needs to be proportional to the potential cost savings.

But full-ceremony FinOps, so to speak, is usually something better left for big money-pit applications where cloud engineer or developer effort can have a significant impact on cost — for organizations with substantial self-service and no culture of cost discipline, or for the big spenders where even moving the needle a little bit on things like basic hygiene can have a pretty large absolute dollar effort relative to the investment.

GreenOps for sustainability must parallel FinOps for cost

Cloud customers are trying to make meaningful sustainability decisions. To really reduce carbon impact (or other types of environmental impact), they need the transparency to understand the impact of their architectural decisions. Just like they need to be able to estimate the cost of a solution, they need to be able to estimate its environmental impact. They need to be able to get an estimate of what the “environmental bill” will be based on the region (and maybe zone), services, and service options they choose. To the extent possible, they then need to see what impact they’re actually generating based on actual utilization.

In other words, they need “GreenOps” the way that they need “FinOps” (using FinOps as a generic term for cloud financial management in this context). And because sustainability is not just carbon impact, they’ll probably eventually need to see a multidimensional set of metrics (or a way to create a custom metric that weights different things that are important to them, like water impact vs carbon impact).

Cloud providers have relatively decent cost tools — cost calculators that allow you to choose solution elements and estimate your bill, cost reporting of various sorts, and so forth. Similarly, the third-party FinOps tooling ecosystem provides good visibility and recommendations to customers.

We don’t really need totally new dashboards and tools for sustainability. What we really need is an extension to the existing cloud cost optimization tools (and the cost transparency and billing APIs that enable those tools) to display environmental impacts as well, so we can manage them alongside our costs. Indeed, most customers will want to make trade-offs between their environmental footprint and costs. For instance, are they potentially willing to pay more to lower their greenhouse gas emissions?

Of course, there are many ways to measure sustainability and many different types of impacts, and not all of them are well suited to this kind of granular breakdown — but drawing a GreenOps parallel to FinOps would help customers extend the tools and processes that they already use (or are developing) for cost management to the emerging need for sustainability management.