Blog Archives
Lightweight Provisioning != Lightweight Process
Congratulations, you’ve virtualized (or gone to public cloud IaaS) and have the ability to instantly and easily provision capacity.
Now, stop and shoot yourself in the foot by not implementing a lightweight procurement process to go with your lightweight provisioning technology.
That’s all too common of a story, and it highlights a critical aspect of movement towards a cloud (or just ‘cloudier’ concepts). In many organizations, it’s not actually the provisioning that’s expensive and lengthy. It’s the process that goes with it.
You’ll probably have heard that it can take weeks or months for an enterprise to provision a server. You might even work for an organization where that’s true. You might also have heard that it takes thousands of dollars to do so, and your organization might have a chargeback mechanism that makes that the case for your department.
Except that it doesn’t actually take that long, and it’s actually pretty darn cheap, as long as you’re large enough to have some reasonable level of automation (mid-sized businesses and up, or technology companies with more than a handful of servers). Even with zero automation, you can buy a server and have it shipped to you in a couple of days, and build it in an afternoon.
What takes forever is the procurement process, which may also be heavily burdened with costs.
When most organizations virtualize, they usually eliminate a lot of the procurement process — getting a VM is usually just a matter of requesting one, rather than going through the whole rigamarole of justifying buying a server. But the “request a VM” process can be anything from a self-service portal to something with as much paperwork headache as buying a server — and the cost-savings and the agility and efficiency that an organization gains from virtualizing is certainly dependent upon whether they’re able to lighten their process for this new world.
There are certain places where the “forever to procure, at vast expense” problems are notably worse. For instance, subsidiaries in companies that have centralized IT in the parent company often seem to get shafted by central IT — they’re likely to tell stories of an uncaring central IT organization, priorities that aren’t aligned with their own, and nonsensical chargeback mechanisms. Moreover, subsidiaries often start out much more nimble and process-light than a parent company that acquired them, which leads to the build-up of frustration and resentment and an attitude of being willing to go out on their own.
And so subsidiaries — and departments of larger corporations — often end up going rogue, turning to procuring an external cloud solution, not because internal IT cannot deliver a technology solution that meets their needs, but because their organization cannot deliver a process that meets their needs.
When we talk about time and cost savings for public cloud IaaS vs. the internal data center, we should be careful not to conflate the burden of (internal, discardable/re-engineerable) process, with what technology is able to deliver.
Note that this also means that fast provisioning is only the beginning of the journey towards agility and efficiency. The service aspects (from self-service to managed service) are much more difficult to solve.
The perils of defaults
A Fortune 1000 technology vendor installed a new IP phone system last year. There was one problem: By IT department policy, that company does not change any defaults associated with hardware or software purchased from a vendor. In this case, the IP phones defaulted to no ring tone. So the phone does not ring audibly when it gets a call. You can imagine just how useful that is. Stunningly, this remains the case months after the initial installation — the company would rather, say, miss customer calls, than change the Holy Defaults.
A software vendor was having an interesting difficulty with a larger customer. The vendor’s configuration file, as shipped with the software, has defaults set up for single-server operation. If you want to run multi-server for high availability or load distribution, you need to change some of the defaults in the configuration file. They encountered a customer with the same kind of “we do not change any defaults”. Unsurprisingly, their multi-server deployment was breaking. The vendor’s support explained what was wrong, explained how to fix it, and was confounded by the policy. This is one of the things a custom distribution from the vendor can be used for, of course, but it’s a head-slapping moment and a grotesque waste of everyone’s time.
Now I’m seeing cloud configurations confounding people who have these kinds of policies. What is “default” when you’re picking from drop-down menus? What do you do when the default selection is something other than what you actually need? And the big one: Will running software on cloud infrastructure necessitate violating virgin defaults?
As an analyst, I’m used to delivering carefully nuanced advice based on individual company situations, policies, and needs. But here’s one no-exceptions opinion: “We never ever change vendor defaults” is a universally stupid policy. It is particularly staggeringly dumb in the cloud world, where generally, if you can pick a configuration, it is a supported configuration. And bluntly, in the non-cloud world, configurable parameters are also just that — things that the vendor intends for you to be able to change. There are obviously ways to screw up your configuration, but those parameters are changeable for a reason. Moreover, if you are just using cloud infrastructure but regular software, you should expect that you may need to tune configuration parameters in order to get optimal performance on a shared virtualized environment that your users are accessing remotely (and you may want to change the security parameters, too).
Vendors: Be aware that some companies, even really big successful companies, sometimes have nonsensical, highly rigid policies regarding defaults. Consider the tradeoffs between defaults as a minimalistic set, and defaults as a common-configuration set. Consider offering multiple default “profiles”. Packaging up your software specifically for cloud deployment isn’t a bad idea, either (i.e., “virtual appliances”).
IT management: Your staff really isn’t so stupid that they’re not able to change any defaults without incurring catastrophic risks. If they are, it’s time for some different engineers, not needlessly ironclad policies.
Discipline and agility are not opposites
Too many service providers (and companies in general) use “discipline” as an excuse for “lack of agility”. Discipline does not mean appointing a committee to study the problem for the next year. Exercising caution and prudence does not mean failing to act. Laying a solid foundation does not mean standing around doing the equivalent of watching the concrete set. This misguided notion of discipline is made even worse if the committee sits around drawing personal conjectures based on fear, and concluding that moving with paralyzing slowness (because, I guess, sudden motion draws predators) is the only safe possibility.
There are highly agile companies out there who study and solve problems in a rigorous way — they go out and gather data, they analyze the data, they come to a conclusion, they come up with a solution, they decide what they want to measure to determine the success or failure of the solution, and go out and act. There’s enormous value in swift, decisive, fact-based action. Bonus points if the decision-making is focused on what delivers value to the customer, and that value (whether qualitative or quantitative) can be clearly articulated and measured.
The Process Trap
Do your processes help or hinder your employees’ ability to deliver great service to customers? When an employee makes an exception to keep a customer happy, is that rewarded or does the employee feel obliged to hide that from his manager? When you design a process, which is more important: Ensuring that nobody can be blamed for a mistake as long as they did what the process said they were supposed to do, or maximizing customer satisfaction? And when a process exception is made, do you have a methodical way to handle it?
Many companies have awesome, dedicated employees who want to do what’s best for the customer. And, confronted with the decision of whether to make a customer happy, or follow the letter of the process, most of them will end up opting for helping the customer. Many organizations, even the most rigidly bureaucratic ones, celebrate those above-and-beyond efforts.
But there’s an important difference in the way that companies handle these process exceptions. Some companies are good at recognizing that people will exercise autonomy, and that systems should be built to handle exceptions, and track why they were granted and what was done. Other companies like to pretend that exceptions don’t exist, so when employees go outside the allowed boundaries, they simply do stuff — the exception is never recorded, and nobody knows what was done or how or why, and if the issue is ever raised again, the account team turns over, or someone wonders why this particular customer has a weird config, nobody will have a clue. And ironically, it tends to be the companies with the most burdensome process — the ones not only most likely to need exceptions, but the ones obsessed with a paperwork trail for even the most trivial minutia — that lack the ability to systematically handle exceptions.
When you build systems, whether human or machine, do you figure out how you’re going to handle the things that will inevitably fall outside your careful design?
The discipline of cloud
Cloud forces configuration management discipline.
As we shift more and more towards provisioning from images, rather than building operating systems from scratch, installing packages, and configuring everything, we move towards the holistic build becoming the norm — essentially, the virtual appliance. Tools companies like rPath and Elastra are taking slices of what should probably be part of broader run-book automation (RBA) solutions that embrace the cloud.
It represents a big shift in thinking for the enterprise. Dot-coms have long lived in the world of cloning being the provisioning norm, and have for years, because they’ve got horizontally-scalable apps for which they build servers by the pallet-load. Enterprises mostly haven’t made that shift yet, because most of what the enterprise is doing is still the one-off application that if you’re lucky, you will get them to deliver a server for in a couple of weeks, and if you’re not lucky, you’ll get sometime in the next nine months. In the dot-com world, it is not acceptable to have gestating an operational environment to take as long as gestating a human.
And that means that the enterprise is going to have to get out of doing the one-off, building machines from scratch, and letting app developers change things on the fly.