FinOps can be a big waste of money
Of late, my colleagues and I have been talking to a lot of clients who want to build a “FinOps team”, which they seem to hope will wave magic wands and reduce their cloud IaaS+PaaS bill. I’m struck by how many clients I talk to don’t have cloud cost problems that are reasonably solvable with FinOps.
Bluntly: For many organizations, there is no reasonable ROI on FinOps (and certainly no sensible business case for building a FinOps team).
This doesn’t mean that the organization shouldn’t manage their cloud finances. It just means that they don’t need to manage their cloud finances in a way that’s meaningfully different from the way that they’ve historically managed IT spending in their on-premises data center. I’ll use the term “FinOps” colloquially here to indicate an organization taking an approach and processes for cloud financial management that are different from their established on-premises IT financial management.
There are lots of common reasons why your organization might not need FinOps. For example:
- You don’t use self-service.Your developers, app management engineers, data scientists, and other technical end-users do not have direct self-service access to cloud services. Instead, all cloud design and provisioning is done by a central infrastructure and operations (I&O) team — or alternatively, all cloud requests go through a service catalog and are manually reviewed and approved. Therefore, nothing happens in the cloud that’s outside of central I&O’s knowledge or control — likely allowing you to manage budgets like you did on-premises.
- You have little to no variability in production: Your applications are allocated a static amount of infrastructure, and/or their usage is almost entirely predictable (for example, they autoscale up during the last week of the month, and then autoscale down after the close of the month). Therefore, your cloud bill for each application is essentially the same every month. You should nevertheless configure budget alerts in case something weird happens that makes usage spike, but that likely will be a one-time thing when the application is first deployed, perhaps with a once-a-year review.
- You’re not spending much money in the cloud. If you’re not spending much money, even a significant percentage reduction in spend (which you could potentially get, for instance, by eliminating all cloud dev/test VMs that aren’t used any longer and could simply be turned off) won’t be that many hard dollars of savings. Putting into place automation that automatically hibernates or deprovisions unused infrastructure may have a useful ROI, but playing manual whack-a-mole that involves a lot of people (whether in paperwork or actually mucking with infrastructure) almost certainly wastes more money in labor time than it saves in cloud costs.
- You don’t have infrastructure-hungry applications. Enterprises often don’t have the voracious scale-out cloud-native applications that are common in digital-native companies, or they only have a small handful of those applications. You might be spending significant money in the cloud, but it’s spread across dozens, hundreds, or even thousands of small applications. Therefore, even if you could cut the necessary capacity for a given application in half, it wouldn’t generate much in the way of monthly cost savings — likely not enough to justify the time of the people doing the work. Lots of enterprises run boring everyday “paperwork” apps on a VM or two (or these days, a container or two). A single-VM app often runs at 40% utilization at max, because of powers-of-two cloud VM sizing, so dropping a “T-shirt” size results in half the capacity and maybe 90% utilization, which many enterprises feel is uncomfortably tight. (And lots of organizations are slightly oversized across the board because they took the “safe” estimate of capacity needs from their cloud migration tools.)
Buying FinOps tools and allocating people to FinOps activities can cost you more than it saves.
Most people launch FinOps practices by purchasing a cloud cost optimization tool of some sort (i.e. a “FinOps tool”). Complicated FinOps processes and/or having a lot of teams and applications you have to corral within your cloud cost governance framework probably result in the genuine need to purchase a third-party FinOps tool — but those tools probably don’t represent a positive ROI until you’re spending more at least a million dollars a year. And then you have to remember that the percentage-of-cloud-spend pricing scheme of those tools can mean that you’re giving the FinOps-tool vendor a pile of money for service elements that they have no optimization capabilities for.
But in many cases, the cost of a tool will be dwarfed by the expense of the employees to do this work, especially in organizations who are making a misguided effort to hire a “FinOps team”. Not only does FinOps represent finance and sourcing overhead, but also cloud operations and engineering overhead — and, most of all, developer overhead (and overhead for any other technical team being asked to do cloud optimization work). If you go further and end up hiring a team that does performance engineering, those people are super rare and expensive.
In other words, being somewhat oversized in the cloud — or being somewhat inefficient in your application code — is a form of insidious creeping technical debt. But it’s the kind of technical debt that tends to linger, because when you look at the business case to actually go after that technical debt, there’s inadequate ROI to justify it. (Indeed, on-premises, people historically haven’t much cared. They throw hardware at the problem and run heavily oversized anyway. Nobody thinks about it because there was capital budget to buy the gear and once the gear was purchased, there wasn’t much reason to contemplate whether the money was efficiently used.)
Moreover, does your business actually want your highly-paid application development teams to chase performance issues in their code, or do they want them adding new features that will deliver new functionality to the business, saving you money elsewhere in your business processes and/or delivering something that will be compelling to customers, thus increasing your top-line revenue?
I certainly think it’s important for nearly all organizations to do some cloud financial management, which they will probably support with tooling. They’ve got to do the basics of cloud cost hygiene (preventing gross waste), budget alerts (to gain rapid awareness of accidents), spend allocation (showback/chargeback) and discount-related planning (what’s necessary for commits, reserved instances, saving plans etc.) — but even there the effort needs to be proportional to the potential cost savings.
But full-ceremony FinOps, so to speak, is usually something better left for big money-pit applications where cloud engineer or developer effort can have a significant impact on cost — for organizations with substantial self-service and no culture of cost discipline, or for the big spenders where even moving the needle a little bit on things like basic hygiene can have a pretty large absolute dollar effort relative to the investment.
Posted on March 31, 2023, in Governance and tagged cloud, cost, FinOps, Governance, IaaS, PaaS. Bookmark the permalink. 4 Comments.
I don’t think companies building central FinOps teams or buying these optimisation tools to be consumed by central teams such as a “Cloud Center of Excellence”, understand cloud. As Corey says, cloud cost is all about architecture, and architecture is fundamentally all about cost. It is the application (DevOps) teams that should be owning the budget for their app, thus have the incentive to manage cloud cost efficiently. Cost optimisation in cloud is just another part of ops, ie. making sure architecture stays optimally aligned to business goals as time progresses. No need for a centralised function, just enable the distributed teams to deal with it.
Alas this is apparently very difficult to grep for many mature enterprises.
Pingback: SRE Weekly Issue #366 – SRE WEEKLY
Pingback: SRE Weekly Issue #366 – FDE
Pingback: rePlay : la revue de presse Cloud - Mai 2023 Blog Devoteam Revolve