Of late, I’ve been talking to a lot of organizations that have learned cloud lessons the hard way — and even more organizations who are newer cloud adopters who seem absolutely determined to make the same mistakes. (Note: Those waving little cloud-repatriation flags shouldn’t be hopeful. Organizations are fixing their errors and moving on successfully with their cloud adoption.)
If your leadership adopts the adage, “Move fast and break things!” then no one should be surprised when things break. If you don’t adequately manage your risks, sometimes things will break in spectacularly public ways, and result in your CIO and/or CISO getting fired.
Many organizations that adopt that philosophy (often with the corresponding imposition of “You build it, you run it!” upon application teams) not only abdicate responsibility to the application teams, but they lose all visibility into what’s going on at the application team level. So they’re not even aware of the risks that are out there, much less whether those risks are being adequately managed. The first time central risk teams become aware of the cracks in the foundation might be when the building collapses in an impressive plume of dust.
(Note that boldness and the willingness to experiment are different from recklessness. Trying out new business ideas that end up failing, attempting different innovative paths for implementing solutions that end up not working out, or rapidly trying a bunch of different things to see which works well — these are calculated risks. They’re absolutely things you should do if you can. That’s different from just doing everything at maximum speed and not worrying about the consequences.)
Just like cloud cost optimization might not be a business priority, broader risk management (especially security risk management) might not be a business priority. If adding new features is more important than address security vulnerabilities, no one should be shocked when vulnerabilities are left in a state of “busy – fix later”. (This is quite possibly worse than “drunk – fix later“, as that at least implies that the fix will be coming as soon as the writer sobers up, whereas busy-ness is essentially a state that tends to persist until death).
It’s faster to build applications that don’t have much if any resilience. It’s faster to build applications if you don’t have to worry about application security (or any other form of security). It’s faster to build applications if you don’t have to worry about performance or cost. It’s faster to build applications if you only need to think about the here-and-now and not any kind of future. It is, in short, faster if you are willing to accumulate meaningful technical debt that will be someone else’s problem to deal with later. (It’s especially convenient if you plan to take your money and run by switching jobs, ensuring you’re free of the consequences.)
“We hope the business and/or dev teams will behave responsibly” is a nice thought, but hope is not a strategy. This is especially true when you do little to nothing to ensure that those teams have the skills to behave responsibly, are usefully incentivized to behave responsibly, and receive enough governance to verify that they are behaving responsibly.
When it all goes pear-shaped, the C-level IT executives (especially the CIO, chief information security officer, and the chief risk officer) are going to be the ones to be held accountable and forced to resign under humiliating circumstances. Even if it’s just because “You should have known better than to let these risks go ungoverned”.
(This usually holds true even if business leaders insisted that they needed to move too quickly to allow risk to be appropriately managed, and those leaders were allowed to override the CIO/CISO/CRO, business leaders pretty much always escape accountability here, because they aren’t expected to have known better. Even when risk folks have made business leaders sign letters that say, “I have been made aware of the risks, and I agree to be personally responsible for them” it’s generally the risk leaders who get held accountable. The business leaders usually get off scott-free even with the written evidence.)
Risk management doesn’t entail never letting things break. Rather, it entails a consideration of risk impacts and probabilities, and thinking intelligently about how to deal with the risks (including implementing compensating controls when you’re doing something that you know is quite risky). But one little crack can, in combination with other little cracks (that you might or might or might not be aware of), result in big breaches. Things rarely break because of black swan events. Rather, they break because you ignored basic hygiene, like “patch known vulnerabilities”. (This can even impact big cloud providers, i.e. the recent Azurescape vulnerability, where Microsoft continued to use 2017-era known-vulnerable open-source code in production.)
However, even in organizations with central governance of risk, it’s all too common to have vulnerability management teams inform you-build-it-you-run-it dev teams that they need to fix Known Issue X. A busy developer will look at their warning, which gives them, say, 30 days to fix the vulnerability, which is within the time bounds of good practice. Then on day 30, the developer will request an extension, and it will probably be granted, giving them, say, another 30 days. When that runs out, the developer will request another extension, and they will repeat this until they run out the extension clock, whereupon usually 90 days or more have elapsed. At that point there will probably be a further delay for the security team to get involved in an enforcement action and actually fix the thing.
There are no magic solutions for this, especially in organizations where teams are so overwhelmed and overworked that anything that might possibly be construed as optional or lower-priority gets dropped on the floor, where it is trampled, forgotten, and covered in old chewing gum. (There are non-magical solutions that require work — more on that in future research notes.)
Moving fast and breaking things takes a toll. And note that sometimes what breaks are people, as the sheer number of things they need to cope with overload their coping mechanisms and they burn out (either in impressive pillars or flame, or quiet extinguishment into ashes).
Of late, I’ve been talking to Amazon customers who are saying, you know, AWS gives us a ton of benefits, it makes a lot of things easy and fast that used to be hard, but in the end, we could do this ourselves, and probably do it at comparable cost or a cost that isn’t too much higher. These are customers that are at some reasonable scale — a take-back would involve dozens if not hundreds of physical server deployments — but aren’t so huge that the investment would be leveraged over, say, tens of thousands of servers.
Most people don’t choose cloud IaaS for lowered costs, unless they have very bursty or unpredictable workloads. Instead, they choose it for increased business agility, which to most people means “getting applications, and thus new business capabilities, more quickly”.
But there’s another key reason to not do it yourself: The war for talent.
The really bright, forward-thinking people in your organization — the people who you would ordinarily rely upon to deploy new technologies like cloud — are valuable. The fact that they’re usually well-paid is almost inconsequential compared to the fact that these are often the people who can drive differentiated, innovative business value for your organization, and they’re rare. Even if you have open headcount, finding those “A” players can be really, really tough, especially if you want a combination of cutting-edge technical skills with the personal traits — drive, follow-through, self-starting, thinking out of the box, communication skills, and so on — that make for top-tier engineers.
Just because you can do it yourself doesn’t mean that you should. Even if your engineers think they’re just as smart as Amazon’s engineers (which they might well be), and are chomping at the bit to prove it. If you can outsource a capability that doesn’t generate competitive advantage for you, then you can free your best people to work on the things that do generate competitive advantage. You can work on the next cool thing… and challenge your engineers to prove their brilliance by dreaming up something that hasn’t been done before, solving the challenges that deliver business value to your organization. Assuming, of course, that your culture provides an environment receptive to such innovation.
A recent blog post on Forbes by Venkatesh Rao, The Rise of Developernomics, has ignited a lot of controversy around the concept that some developers are as much as 10x more productive than others. It’s not a new debate; the assertion that some developers are 20x more productive than others has been around forever, and folks like Jole Spolsky have asserted that it’s not just a matter of productivity, but also a developer’s ability to hit the high notes of real breakthrough achievement that makes for greatness.
Worth reading out of all of these threads: Avichal Garg of Spool’s blog post on building 10x teams, which has a very nice dissection of the composition of great teams.
Also, for those of you who haven’t read it: Now Discover Your Strengths is a fantastic way to look at what people’s work-related strengths are, since it takes into account a broad range of personal and interpersonal traits. Rackspace turned me onto it a number of years ago; they actually hang a little sign with each employee’s strengths on their cube. (See mine for an example.)
Jon Evans of TechCrunch wrote a good blog post a few months ago, Why the New Guy Can’t Code, which illustrates the challenges of hiring good developers. (There are shocking numbers of developers out there who have never really produced significant code in their jobs. Indeed, I once interviewed a developer with five years of experience who had never written code in a work context — he kept being moved from project to project that was only in the formal requirements phase, so all he had was his five-years-stale student efforts from his CS degree.)
Even with the massive pile of unemployed developers out there, it’s still phenomenally challenging to hire good people. And if your company requires a narrow and specific set of things that the developer must have worked with before, rather than hiring a smart Swiss army knife of a developer who can pick up anything given a few days, you will have an even bigger problem, especially if you require multiple years of experience with brand-new technologies like AWS, NoSQL, Hadoop, etc.
With more and more Web hosters, systems integrators, and other infrastructure-specialist companies transforming themselves into cloud providers, and sometimes outright buying software companies (such as Terremark buying CloudSwitch, and Virtustream buying Enomaly), serious software development chops are becoming a key for a whole range of service providers who never really had significant development teams in the past. No one should underestimate how much of a shortage there is for great talent.
As a reminder, Gartner is hiring!
Developers who deeply understand the arcana of infrastructure, and operators who can code and understand the interaction of applications and infrastructure, are better than developers and operators who understand only their own discipline. But it’s typically easier, from the perspective of training, for a developer to learn operations, than for an operator to learn development.
While there are fair number of people who teach themselves on-the-job, most developers still come out of formal computer science backgrounds. The effectiveness of formal education in CS varies immensely, and you can get a good understanding by reading on your own, of course, if you read the right things — it’s the knowledge that matters, not how you got it. But ideally, a developer should accumulate the background necessary to understand the theory of operating systems, and then have a deeper knowledge of the particular operating system that they primarily work with, as well as the arcana of the middleware. It’s intensely useful to know how the abstract code you write, actually turns out to run in practice. Even if you’re writing in a very high-level programming language, knowing what’s going on under the hood will help you write better code.
Many people who come to operations from the technician end of things never pick up this kind of knowledge; a lot of people who enter either systems administration or network operations do so without the benefit of a rigorous education in computer science, whether from college or self-administered. They can do very well in operations, but it’s generally not until you reach the senior-level architects that you commonly find people who deeply understand the interaction of applications, systems, and networks.
Unfortunately, historically, we have seen this division in terms of relative salaries and career paths for developers vs. operators. Operators are often treated like technicians; they’re often smart learn-on-the-job people without college degrees, but consequently, companies pay accordingly and may limit advancement paths accordingly, especially if the company has fairly strict requirements that managers have degrees. Good developers often emerge from college with minimum competitive salary requirements well above what entry-level operations people make.
Silicon Valley has a good collection of people with both development and operations skills because so many start-ups are founded by developers, who chug along, learning operations as they go, because initially they can’t afford to hire dedicated operations people; moreover, for more than a decade, hypergrowth Internet start-ups have deliberately run devops organizations, making the skillset both pervasive and well-paid. This is decidedly not the case in most corporate IT, where development and operations tend to have a hard wall between them, and people tend to be hired for heavyweight app development skills, more so than capabilities in systems programming and agile-friendly languages.
Here are my reasons for why developers make better operators, or perhaps more accurately, an argument for why a blended skillset is best. (And here I stress that this is personal opinion, and not a Gartner research position; for official research, check out the work of my esteemed colleagues Cameron Haight and Sean Kenefick. However, as someone who was formally educated as a developer but chose to go into operations, and who has personally run large devops organizations, this is a strongly-held set of opinions for me. I think that to be a truly great architect-level ops person, you also have to have a developer’s skillset, and I believe it’s important to mid-level people as well, which I recognize as a controversial opinions.)
Understanding the interaction of applications and infrastructure leads to better design of both. This is an architect’s role, and good devops understand how to look at applications and advise developers how they can make them more operations-friendly, and know how to match applications and infrastructure to one another. Availability, performance, and security are all vital to understand. (Even in the cloud, sharp folks have to ask questions about what the underlying infrastructure is. It’s not truly abstract; your performance will be impacted if you have a serious mismatch between the underlying infrastructure implementation and your application code.)
Understanding app/infrastructure interactions leads to more effective troubleshooting. An operator who can CTrace, DTrace, sniff networks, read application code, and know how that application code translates to stuff happening on infrastructure, is in a much better position to understand what’s going wrong and how to fix it.
Being able to easily write code means less wasted time doing things manually. If you can code nearly as quickly as you can do something by hand, you will simply write it as a script and never have to think about doing it by hand again — and neither will anyone else, if you have a good method for script-sharing. It also means that forever more, this thing will be done in a consistent way. It is the only way to truly operate at scale.
Scripting everything, even one-time tasks, leads to more reliable operations. When working in complex production environments (and arguably, in any environment), it is useful to write out every single thing you are going to do, and your action plan for any stage you deem dangerous. It might not be a formal “script”, but a command-by-command plan can be reviewed by other people, and it means that you are not making spot decisions under the time pressure of a maintenance window. Even non-developers can do this, of course, but most don’t.
Converging testing and monitoring leads to better operations. This is a place where development and operations truly cross. Deep monitoring converges into full test coverage, and given the push towards test-driven development in agile methodologies, it makes sense to make production monitoring part of the whole testing lifecycle.
Development disciplines also apply to operations. The systems development lifecycle is applicable to operations projects, and brings discipline to what can otherwise be unstructured work; agile methodologies can be adapted to operations. Writing the tests first, keeping things in a revision control system, and considering systems holistically rather than as a collection of accumulated button-presses are all valuable.
The move to cloud computing is a move towards software-defined everything. Software-defined infrastructure and programmatic access to everything inherently advantages developers, and it turns the hardware-wrangling skills into things for low-level technicians and vendor field engineering organizations. Operations becomes software-oriented operations, one way or another, and development skills are necessary to make this transition.
It is unfortunately easier to teach operations to developers, than it is to teach operators to code. This is especially true when you want people to write good and maintainable code — not the kind of script in which people call out to shell commands for the utilities that they need rather than using the appropriate system libraries, or splattering out the kind of program structure that makes re-use nigh-impossible, or writing goop that nobody else can read. This is not just about the crude programming skills necessary to bang out scripts; this is about truly understanding the deep voodoo of the interactions between applications, systems, and networks, and being able to neatly encapsulate those things in code when need be.
Devops is a great place for impatient developers who want to see their code turn into results right now; code for operations often comes in a shorter form, producing tangible results in a faster timeframe than the longer lifecycles of app development (even in agile environments). As an industry, we don’t do enough to help people learn the zen of it, and to provide career paths for it. It’s an operations specialty unto itself.
Devops is not just a world in which developers carry pagers; in fact, it doesn’t necessarily mean that application developers carry pagers at all. It’s not even just about a closer collaboration between development and operations. Instead, it can mean that other than your most junior button-pushers and your most intense hardware specialists, your operations people understand both applications and infrastructure, and that they write code as necessary to highly automate the production environment. (This is more the philosophy of Google’s Site Reliability Engineering, than it is Amazon-style devops, in other words.)
But for traditional corporate IT, it means hiring a different sort of person, and paying differently, and altering the career path.
A little while back, I had lunch with a client from a mid-market business, which they spent telling me about how efficient their IT had become, especially after virtualization — trying to persuade me that they didn’t need the cloud, now or ever. Curious, I asked how long it typically took to get a virtualized server up and running. The answer turned out to be three days — because while they could push a button and get a VM, all storage and networking still had to be manually provisioned. That led me to probe about a lot of other operations aspects, all of which were done by hand. The client eventually protested, “If I were to do the things you’re talking about, I’d have to hire programmers into operations!” I agreed that this was precisely what was needed, and the client protested that they couldn’t do that, because programmers are expensive, and besides, what would they do with their existing do-everything-by-hand staff? (I’ve heard similar sentiments many times over from clients, but this one really sticks in my mind because of how shocked this particular client was by the notion.)
Yes. Developers are expensive, and for many organizations, it may seem alien to use them in an operations capacity. But there’s a cost to a lack of agility and to unnecessarily performing tasks manually.
But lessons learned in the hot seat of hypergrowth Silicon Valley start-ups take forever to trickle into traditional corporate IT. (Even in Silicon Valley, there’s often a gulf between the way product operations works, and the way traditional IT within that same company works.)
People often ask me what it’s like to be an analyst at Gartner, and for me, the answer is, “It’s a life of constant client conversations.” Over the course of a typical year, I’ll do something on the order of 1,200 formal one-on-one conversations (or one-on-small-team, if the client brings in some other colleagues), generally 30 minutes in length. That doesn’t count the large number of other casual interactions at conferences and whatnot.
While Gartner serves pretty much the entire technology industry, and consequently I talk to plenty of folks at little start-ups and whatnot, our bread-and-butter client — 80% of Gartner’s revenue — comes from “end-users”, which means IT management at mid-market businesses and enterprise.
Over the years, I have learned a lot of important things about dealing with clients. One of them is that they generally aren’t really interested in best practices. They find best practices fascinating, but they frequently can’t put them to use in their own organizations. They’re actually interested in good practices — things that several other organizations like them have done successfully and which are practically applicable to their own environment.
More broadly, there’s a reason that analysts are still in business — people need advice that’s tailored to their particular needs. You know the Tolstoy line “Happy families are all alike, but every unhappy family is unhappy in its own way” that starts Anna Karenina? Well, every corporate IT department has its own unique pathology. There are the constraints of the business (real and imagined) and the corporate culture, the culture in IT specifically, the existing IT environment in all of its broken glory and layers of legacy, the available budget and resources and skills (modified by whether or not they are willing to hire consultants and other outside help), the people and personalities and pet peeves and personal ambitions, and even the way that they like to deal with analysts. (Right down to the fact that some clients have openly said that they don’t like a woman telling them what to do.)
To be a successful advisor, you have to recognize that most people can’t aim for the “ideal” solution. They have to find a solution that will work for their particular circumstances, with all of the limitations of it — some admittedly self-imposed, but nevertheless important. You can start by articulating an ideal, but it has to quickly come down to earth.
But cloud computing has turned out to be an extra-special set of landmines. When clients come to me wanting to do a “cloud computing” or “cloud infrastructure” project, especially if they don’t have a specific thing in mind, I’ve learned to ask, “Why are you doing this?” Is this client reluctant, pushed into doing this only because someone higher-up is demanding that they do ‘something in the cloud’? Is this client genuinely interested in seeing this project succeed, or would he rather it fail? Does he want to put real effort into it, or just a token? Is he trying to create a proof of concept that he can build upon, or is this a one-shot effort? Is he doing this for career reasons? Does he hope to get his name in the press or be the subject of a case study? What are the constraints of his industry, his business, his environment, and his organization?
My answer to, “What should I do?” varies based on these factors, and I explain my reasoning to the client. My job is not to give academic theoretical answers — my job is to offer advice that will work for this client in his current circumstances, even if I think it’s directionally wrong for the organization in the long term. (I try to shake clients out of their complacency, but in the end, I’m just trying to leave them with something to think about, so they understand the implications of their decisions, and how clinging to the way things are now will have business ramfiications over the long term.) However, not-infrequently, my job involves helping a deeply reluctant client think of some token project that he can put on cloud infrastructure so he can tell his CEO/CFO/CIO that he’s done it.
Cloud providers dealing with traditional corporate IT should keep in mind that not everyone who inquires about their service has a genuine desire for the project to be a success — and even those who are hoping for success don’t necessarily have pure motivations.
A recent client inquiry of mine involved a very large enterprise, who informed me that their executives had decided that IT should become more like a cloud provider — like Google or Facebook or Amazon. They wanted to understand how they should transform their organization and their IT infrastructure in order to do this.
There were countless IT people on this phone consultation, and I’d received a dizzying introducing to names and titles and job functions, but not one person in the room was someone who did real work, i.e., someone who wrote code or managed systems or gathered requirements from the business, or even did higher-level architecture. These weren’t even people who had direct management responsibility for people who did real work. They were part of the diffuse cloud of people who are in charge of the general principle of getting something done eventually, that you find everywhere in most large organizations (IT or not).
I said, “If you’re going to operate like a cloud provider, you will need to be willing to fire almost everyone in this room.”
That got their attention. By the time I’d spent half an hour explaining to them what a cloud provider’s organization looks like, they had decidedly lost their enthusiasm for the concept, as well as been poleaxed by the fundamental transformations they would have to make in their approach to IT.
Another large enterprise client recently asked me to explain Rackspace’s organization to them. They wanted to transform their internal IT to resemble a hosting company’s, and Rackspace, with its high degree of customer satisfaction and reputation for being a good place to work, seemed like an ideal model to them. So I spent some time explaining the way that hosting companies organize, and how Rackspace in particular does — in a very flat, matrix-managed way, with horizontally-integrated teams that service a customer group in a holistic manner, coupled with some shared-services groups.
A few days later, the client asked me for a follow-up call. They said, “We’ve been thinking about what you’ve said, and have drawn out the org… and we’re wondering, where’s all the management?”
I said, “There isn’t any more management. That’s all there is.” (The very flat organization means responsibility pushed down to team leads who also serve functional roles, a modest number of managers, and a very small number of directors who have very big organizations.)
The client said, “Well, without a lot of management, where’s the career path in our organization? We can’t do something like this!”
Large enteprise IT organizations are almost always full of inertia. Many mid-market IT organizations are as well. In fact, the ones that make me twitch the most are the mid-market IT directors who are actually doing a great job with managing their infrastructure — but constrained by their scale, they are usually just good for their size and not awesome on the general scale of things, but are doing well enough to resist change that would shake things up.
Business, though, is increasingly on a wartime footing — and the business is pressuring IT, usually in the form of the development organization, to get more things done and to get them done faster. And this is where the dissonance really gets highlighted.
A while back, one of my clients told me about an interesting approach they were trying. They had a legacy data center that was a general mess of stuff. And they had a brand-new, shiny data center with a stringent set of rules for applications and infrastructure. You could only deploy into the new shiny data center if you followed the rules, which gave people an incentive to toe the line, and generally ensured that anything new would be cleanly deployed and maintained in a standardized manner.
It makes me wonder about the viability of an experiment for large enterprise IT with serious inertia problems: Start a fresh new environment with a new philosophy, perhaps a devops philosophy, with all the joy of having a greenfield deployment, and simply begin deploying new applications into it. Leave legacy IT with the mess, rather than letting the morass kill every new initiative that’s tried.
Although this is hampered by one serious problem: IT superstars rarely go to work in enterprises (excepting certain places, like some corners of financial services), and they especially don’t go to work in organizations with inertia problems.
Just because it’s in the cloud doesn’t make it magic. And it can be very, very dangerous to assume that it is.
I recently talked to an enterprise client who has a group of developers who decided to go out, develop, and run their application on Amazon EC2. Great. It’s working well, it’s inexpensive, and they’re happy. So Central IT is figuring out what to do next.
I asked curiously, “Who is managing the servers?”
The client said, well, Amazon, of course!
Except Amazon doesn’t manage guest operating systems and applications.
It turns out that these developers believed in the magical cloud — an environment where everything was somehow mysteriously being taken care of by Amazon, so they had no need to do the usual maintenance tasks, including worrying about security — and had convinced IT Operations of this, too.
Imagine running Windows. Installed as-is, and never updated since then. Without anti-virus, or any other security measures, other than Amazon’s default firewall (which luckily defaults to largely closed).
Plus, they also assumed that auto-scaling was going to make their app magically scale. It’s not designed to automagically scale horizontally. Somebody is going to be an unhappy camper.
Cautionary tale for IT shops: Make sure you know what the cloud is and isn’t getting you.
Cautionary tale for cloud providers: What you’re actually providing may bear no resemblance to what your customer thinks you’re providing.
Congratulations, you’ve virtualized (or gone to public cloud IaaS) and have the ability to instantly and easily provision capacity.
Now, stop and shoot yourself in the foot by not implementing a lightweight procurement process to go with your lightweight provisioning technology.
That’s all too common of a story, and it highlights a critical aspect of movement towards a cloud (or just ‘cloudier’ concepts). In many organizations, it’s not actually the provisioning that’s expensive and lengthy. It’s the process that goes with it.
You’ll probably have heard that it can take weeks or months for an enterprise to provision a server. You might even work for an organization where that’s true. You might also have heard that it takes thousands of dollars to do so, and your organization might have a chargeback mechanism that makes that the case for your department.
Except that it doesn’t actually take that long, and it’s actually pretty darn cheap, as long as you’re large enough to have some reasonable level of automation (mid-sized businesses and up, or technology companies with more than a handful of servers). Even with zero automation, you can buy a server and have it shipped to you in a couple of days, and build it in an afternoon.
What takes forever is the procurement process, which may also be heavily burdened with costs.
When most organizations virtualize, they usually eliminate a lot of the procurement process — getting a VM is usually just a matter of requesting one, rather than going through the whole rigamarole of justifying buying a server. But the “request a VM” process can be anything from a self-service portal to something with as much paperwork headache as buying a server — and the cost-savings and the agility and efficiency that an organization gains from virtualizing is certainly dependent upon whether they’re able to lighten their process for this new world.
There are certain places where the “forever to procure, at vast expense” problems are notably worse. For instance, subsidiaries in companies that have centralized IT in the parent company often seem to get shafted by central IT — they’re likely to tell stories of an uncaring central IT organization, priorities that aren’t aligned with their own, and nonsensical chargeback mechanisms. Moreover, subsidiaries often start out much more nimble and process-light than a parent company that acquired them, which leads to the build-up of frustration and resentment and an attitude of being willing to go out on their own.
And so subsidiaries — and departments of larger corporations — often end up going rogue, turning to procuring an external cloud solution, not because internal IT cannot deliver a technology solution that meets their needs, but because their organization cannot deliver a process that meets their needs.
When we talk about time and cost savings for public cloud IaaS vs. the internal data center, we should be careful not to conflate the burden of (internal, discardable/re-engineerable) process, with what technology is able to deliver.
Note that this also means that fast provisioning is only the beginning of the journey towards agility and efficiency. The service aspects (from self-service to managed service) are much more difficult to solve.
There’s a lot to be said for the ability to get a server for less than the price of a stick of chewing gum.
But convenience has a price, and it’s sufficient that shared hosters, blog hosters, and other folks who make their daily pittance from infrastructure-plus-a-little-extra aren’t especially threatened by cloud infrastructure services.
For instance, I pay for WordPress to host a blog because, while I am readily capable of managing a cloud server and everything necessary to run WordPress, I don’t want to deal with it. I have better things to do with my time.
Small businesses will continue to use traditional shared hosting or even some control-panel-based VPS offerings, despite the much-inferior price-to-resource ratios compared to raw cloud servers, because of the convenience of not having to cope with administration.
The reason why cloud servers are not a significant cost savings for most enterprises (when running continuously, not burst or one-time capacity), is because administration is still a tremendous burden. It’s why PaaS offerings will gain more and more traction over time, as the platforms mature, but also why those companies that crack the code to really automating systems administration will win over time.
I was pondering this equation while contemplating the downtime of a host that I use for some personal stuff; they’ve got a multi-hour maintenance downtime this weekend. My solution to this was simple: write a script that would, shortly before shutdown time, automatically shut down my application, provision a 1.5-cent-an-hour cloud server over on Rackspace, copy the data over, and fire up the application on its new home. (Note: This was just a couple of lines of code, taking moments to type.) The only thing I couldn’t automate was the DNS changeover, since I use GoDaddy for primary DNS and they don’t have an API available for ordinary customers. But conveniently: failover, without having to disrupt my Saturday.
But I realized that I was paying, on a resource-unit equivalent, tremendously more for my regular hosting than I would for a cloud server. Mostly, I’m paying for the convenience of not thinking — for not having to deal with making sure the OS is hardened, pay attention to security advisories, patch, upgrade, watch my logs, etc. I can probably afford the crude way of not thinking for a couple of hours — blindly shutting down all ports, pretty much — but I’m not comfortable with that approach for more than an afternoon.
This is, by the way, also a key difference between the small-business folks who have one or two servers, and the larger IT organizations with dozens, hundreds, or thousands of servers. The fewer you’ve got, the less efficient your labor leverage is. The guy with the largest scale doesn’t necessarily win on cost-efficiency, but there’s definitely an advantage to getting to enough scale.
As I talk to clients, it strikes me that companies with fairly similar IT infrastructures can use very different words to describe how they feel about it. One client might say, “Oh, we’re just a small IT shop, we’ve only got a little over 250 servers, we think cloud computing is for people like us.” Another client that’s functionally identical (same approximate business size, industry, mix of workloads and technologies) might say, much more indignantly, “We’re a big IT shop! We’ve got more than 250 servers! Cloud computing can’t help enterprises like us!”
“SMB” is a broadly confused term. So, for that matter, is “enterprise”. I tend to prefer the term “mid-market”, but even that is sort of cop-out language. Moreover, business size and IT size don’t correlate. Consider the Fortune 500 companies that extract natural resources, vs. their neighbors on the list, for instance.
Vendors have to be careful how they pitch their marketing. Mid-sized companies and/or mid-sized IT shops don’t always know when they’re talking about them, and not some other sort of company. Conversely, IT managers have to look more deeply to figure out if a particular sort of cloud service is right for their organization. Don’t dismiss a cloud service out of hand because you think you’re either too big or too small for it.