Blog Archives
The discipline of cloud
Cloud forces configuration management discipline.
As we shift more and more towards provisioning from images, rather than building operating systems from scratch, installing packages, and configuring everything, we move towards the holistic build becoming the norm — essentially, the virtual appliance. Tools companies like rPath and Elastra are taking slices of what should probably be part of broader run-book automation (RBA) solutions that embrace the cloud.
It represents a big shift in thinking for the enterprise. Dot-coms have long lived in the world of cloning being the provisioning norm, and have for years, because they’ve got horizontally-scalable apps for which they build servers by the pallet-load. Enterprises mostly haven’t made that shift yet, because most of what the enterprise is doing is still the one-off application that if you’re lucky, you will get them to deliver a server for in a couple of weeks, and if you’re not lucky, you’ll get sometime in the next nine months. In the dot-com world, it is not acceptable to have gestating an operational environment to take as long as gestating a human.
And that means that the enterprise is going to have to get out of doing the one-off, building machines from scratch, and letting app developers change things on the fly.
Cloud risks and organizational culture
I’ve been working on a note about Amazon EC2, and pondering how different the Web operations culture of Silicon Valley is from that of the typical enterprise IT organization.
Silicon Valley’s prevailing Ops culture is about speed. There’s a desperate sense of urgency that seems to prevail there, a relentless expectation that you can be the Next Big Thing, if only you can get there fast enough. Or, alternatively, you are the Current Big Thing, and it is all you can do to keep up with your growth, or at least not have the Out Of Resources truck run right over you.
Enterprise IT culture tends to be about risk mitigation. It is about taking your time, being thorough, and making the right decisions and ensuring that nothing bad happens as the result of them.
To techgeeks at start-ups in the Valley (and I mean absolutely no disparagement by this, as I was one, and perhaps still would be, if I hadn’t become an analyst), the promise and usefulness of cloud computing is obvious. The question is not if; it is when — when can I buy a cloud that has the particular features I need to make my life easier? But: Simplify my architecture? Solve my scaling problems and improve my availability? Give me infrastructure the instant I need it, and charge me only when I get it? I want it right now. I wanted it yesterday, I wanted it last year. Got a couple of problems? Hey, everyone makes mistakes; just don’t make them twice. If I’d done it myself, I’d have made mistakes too; anyone would have. We all know this is hard. No SLA? Just fix it as quickly as you can, and let me know what went wrong. It’s not like I’m expecting you to go to Tahiti while my infrastructure burns; I know you’ll try your best. Sure, it’s risky, but heck, my whole business is a risk! No guts, no glory!
Your typical enterprise IT guy is struck aghast by that attitude. He does not have the problem of waking up one morning and discovering that his sleepy little Facebook app has suddenly gotten the attention of teenyboppers world-wide and now he needs a few hundred or a few thousand servers right this minute, while he prays that his application actually scales in a somewhat linear fashion. He’s not dealing with technology he’s built himself that might or might not work. He isn’t pushing the limits and having to call the vendor to report an obscure bug in the operating system. He isn’t being asked to justify his spending to the board of directors. He lives in a world of known things — budgets worked out a year in advance, relatively predictable customer growth, structured application development cycles stretched out over months, technology solutions that are thoroughly supported by vendors. And so he wants to try to avoid introducing unknowns and risks into his environment.
Despite eight years at Gartner, advising clients that are mostly fairly conservative in their technology decisions, I still find myself wanting to think in early-adopter mode. In trying to write for our clients, I’m finding it hard to shift from that mode. It’s not that I’m not skeptical about the cloud vendors (and I’m trying to be hands-on with as many platforms as I can, so I can get some first-hand understanding and a reality check). It’s that I am by nature rooted in that world that doesn’t care as much about risk. I am interested in reasonable risk versus the safest course of action.
Realistically, enterprises are going to adopt cloud infrastructure in a very different way and at a very different pace than fast-moving technology start-ups. At the moment, few enterprises are compelled towards that transformation in the way that the Web 2.0 start-ups are — their existing solutions are good enough, so what’s going to make them move? All the strengths of cloud infrastructure — massive scalability, cost-efficient variable capacity, Internet-readiness — are things that most enterprises don’t care about that much.
That’s the decision framework I’m trying to work out next.
I am actively interested in cloud infrastructure adoption stories, especially from “traditional” enterprises who have made the leap, even in an experimental way. If you’ve got an experience to share, using EC2, Joyent, Mosso, EngineYard, Terremark’s Infinistructure, etc., I’d love to hear it, either in a comment on my blog or via email at lydia dot leong at gartner dot com.
The Cloud skills shift
Something that I’ve been thinking about: The shift to global-class computing, and massively scalable infrastructure, represents a fundamental shift in the skill sets that will be valued in IT Operations.
Those of you who, like myself, have worked at service providers in hyper-growth mode, are already familiar with what occurs when you need to grow at red-shift speeds: You automate everything humanly possible, and you try to standardize the heck out of things. Usually you end up trying to make sure that your infrastructure is horizontally scalable, and that your hardware is as interchangeable as possible, alllowing any single server to fail and the system as a whole to go chugging along, while you eventually go yank that server out and replace it with another just-as-generic box that you’ve auto-provisioned.
The shift to the cloud model, whether public or private, basically pushes the idea that every IT organization does that, either in-house or through the services of a provider. It puts the premium on software development / scripting skills — these are the guys who automate things and who write the glue for integrating your toolsets. You’ll have a handful of guys who are your serious architects — the guys who tune and optimize your hardware and storage, design your configurations, and so on. (That might be a single guru, or you might go to consultants for that, alternatively.) You’ll have a few folks who know the operational ins-and-outs of troubleshooting your applications. Everyone else becomes a hardware monkey, entry-level folks who don’t need much more of a skillset than it takes to assemble a PC from parts.
This is writ large in the Google model of Operations, but it’s been true for the last decade in every dot-com of significant size, too. Your hardware operations guys are rack-and-stack types. Everyone else blends systems administration with scripting abilities, and because your toolsets have to scale and be highly maintainable, this is scripting that has the air of serious development, not a one-time thing that can be banged out unreadably.
The routine drudgery of IT Operations is going to get automated away, bit by bit. Right now, many enterprises still operate at a scale and lack of standardization that means that it’s not necessarily more efficient to automate a task than simply do it manually. In the cloud model, the balance tips to the automation side, and the basic value of “I can wrangle boxes” declines precipitously.
My advice to sysadmins: If you are not fluent in a scripting language, and/or not capable of writing structured, readable, maintainable, non-hackish code, now is the time to learn.