Monthly Archives: November 2011
I originally started writing this blog post before Forrester’s James Staten made a post called “Public Clouds Prove I&O Pros Are From Venus And Developers Are From Mars“, and reading made me change this post into a response to his, as well as covering the original point I wanted to make.
In his post, James argues that cloud IaaS offerings are generally either developer-centric or I&O-centric, which leads to an emphasis on either self-service or managed services, with different feature-set priorities. Broadly speaking, I don’t disagree with him, but I think there’s a crucial point that he’s missing (or at least doesn’t mention), that is critical for cloud IaaS providers to understand.
Namely, it’s this: Developers are the face of business buyers.
We can all agree, I’m sure, that self-service cloud IaaS of the Amazon variety has truly empowered developers at start-ups and small businesses, who previously didn’t have immediate access to cheap infrastructure. Sometimes these developers are simply using IaaS as a substitute for having to get hardware and colocation. Sometimes they’re taking advantage of the unique capabilities exposed by programmatic access to infrastructure. Sometimes they’re just writing simple Web apps the same way they always have. Sometimes they’re writing truly cloud-native applications. Sometimes they really need to match their capacity to their highly-variable needs. Sometimes they have steady-state infrastructure. You can’t generalize about them too broadly. But their reasons for using the cloud are pretty clear.
But what’s driving developers in well-established businesses, with IT Operations organizations that have virtualized infrastructure and maybe even private cloud, to put stuff in the public cloud?
It’s simple. They’ve asked for something and IT Operations can’t give it to them in the timeframe that they need. Or IT Operations is such a pain to deal with that they don’t even want to ask. (Yes, sometimes, they want programmatic infrastructure, have highly variable capacity needs, etc. Then they think like start-ups. But this is a tiny, tiny percentage of projects in traditional businesses, and even a small percentage of those that use cloud IaaS.)
And why do they want something? Well, it’s because the business has asked the applications development group to develop a thingy that does X, and the developer is trotting off to try to write X, only he can’t actually do that until IT Operations can give him a server on which to do X, and possibly some other stuff as well, like a load balancer.
So what happens is you get a developer who goes back to a business manager and says, “Well, I could deliver you the code for X in six weeks, except IT Operations tells me that they can’t get around to giving me a server for it for another three weeks.” (In some organizations, especially ones without effective virtualization, that can be months.) The business manager says, “That’s unacceptable. We can’t wait that long.” And the developer sighs and says, “Don’t worry about it. I’ll just take care of it.” And then some cloud IaaS provider, probably one who’s able to offer infrastructure, right now, gets a brand-new customer. This is what businesses mean when they talk about “agility” from the cloud.
Maybe the business has had this happen enough that Enterprise Architecture has led the evaluation of cloud IaaS providers, chosen one or more, set down guideliens for their use, and led the signing of some sort of master services agreement with those providers. Or maybe this is the first sign-up. Either way, developers are key to the decision-making.
When it comes to go into production, maybe IT Operations has its act together, and it comes back into the business’s data center. Maybe it has to move to another external provider — IT Operations has sourced something, or Enterprise Architecture has set a policy for where particular production workloads must run. So maybe it goes to traditional managed hosting, hybrid hosting, or a different cloud provider. Maybe it stays with the cloud the developer chose, though. There’s a lot to be said for incumbency.
But the key thing is this: In SaaS, business buyers are bypassing IT to get their own business needs met. In IaaS, business buyers are doing the same thing — it’s just that it’s the developer that is fronting the sourcing, and is therefore making the decision of when to go cloud and who to use when they do, at least initially.
So if you’re a cloud provider and you say, “We don’t serve individual developers” (which, in my experience, you’ll generally say with a sneer), you are basically saying, “We don’t care about the business buyer.” Which is a perfect valid sales strategy, but you should keep in mind that the business controls two-thirds of cloud spending (so IT Operations holds the purse-strings only a third of the time), according to Gartner’s surveys. You like money, don’t you?
There are many, many more nuances to this, of course (nuances to be explored in a research note for Gartner clients, naturally, because there’s only so much you get for free). But it leads to the conclusion that you must be able to sell to both developers and IT Operations, regardless of the nature of your offering, unless you really want to limit your market opportunity. And that means that the roadmaps of leading providers will be convergent to deliver the features needed by both constituencies.
The fine folks at Nodeable gave me an informal introductory briefing today; they’ve got a pretty cool concept for a cloud-oriented monitoring and management SaaS-based tool that’s aimed at DevOps.
I’ve been having stray thoughts on DevOps and the future of IT Operations in the couple of hours that have passed since then, and reflecting on the following problem:
At an awful lot of companies, IT Operations, especially lower-level folks, are button-pushing monkeys — specifically, they are people who know how to use the vendor-supplied GUI to perform particular tasks. They may know the vendor-recommended ways to do things with a particular bit of hardware or software. But only a few of them have architect-level knowledge, the deep understanding of the esoterica of systems and how this stuff is actually built and engineered. (Some of this is a reflection of education; a lot of IT Operations people don’t come from a computer science background, but have what they’ve needed to know on the job.)
Today’s DevOps person is likely to have a skillset that we used to call systems programming. They understand systems architecture, they understand operating systems, they can write system-level code, including the scripting necessary for automation. The programmatic access to infrastructure exemplified by cloud IaaS providers has moved this up a layer of abstraction, so that you don’t have to be a deep-voodoo guy to do this kind of thing.
We’re moving towards a world where you have really low-level button-pushers — possibly where the button-pushing is so simple that you don’t need a specialist to do it any longer, anyone reasonably technical can do it — and senior architects whoo design things, and systems programmers who automate things. Whether those systems programmers work in application development and are “DevOps”, or whether those systems programmers work in IT Operations and just happen to be systems guys who program (mostly scripting), doesn’t really matter — the era of the button-pusher is drawing towards its close either way, at least for organizations who are going to efficiently increase IT Operations efficiency.
I want to share a story. It is, in some ways, a story about cruelty and unprofessionalism, but it’s funny in its own way.
About fifteen years ago, I was working as an engineer at Digex (the first real managed hosting company). We had a pretty highly skilled group of engineers there, and we never did anything using a GUI. We had hundreds of customers on dedicated Sun servers, and you’d either SSH into the systems or, in a pinch, go to the data center and log in on console. We were also the kind of people who would fix issues by making kernel modifications — for instance, the day that the SYN flood attack showed up, a bunch of customers went down hard, meaning that we could not afford to wait for Sun to come up with a patch, since we had customer SLAs to meet, so one of our security engineers rewrote the kernel’s queueing code for TCP accepts.
We were without a manager for some time, and they finally hired a guy who was supposedly a great Sun sysadmin. He didn’t actually get a technical interview, but he had a good work history of completed projects and happy teams and so forth. He was supposed to be both the manager and the technical lead for the team.
The problem was that he had no idea how to do anything that wasn’t in Sun’s administrator GUI. He didn’t even know how to attach a console cable to a server, much less log in remotely to a system. Since we did absolutely nothing with a GUI, this was a big problem. An even bigger problem was that he didn’t understand anything about the underlying technologies we were supporting. If he had a problem, he was used to calling Sun and having them tell him what to do. This, clearly, is a big problem in a managed hosting environment where you’re the first line of support for your customers, who may do arbitrary wacky things.
He also worked a nine-to-five day at a startup where engineers routinely spent sixteen hours at work. His team, and the other engineers at the company, had nothing but contempt for him. And one night, having dinner at 10 pm as a break before going right back into work, someone had an idea.
“Let’s recompile his kernel without mouse support.” (Like all the engineers, he had a Sun workstation at his desk.)
And so when he came to work the next morning, his mouse didn’t work — and every trace of the intrusion had been covered, thanks to the complicity of one of the security engineers.
Someone who had an idea of what he was doing wouldn’t have been phazed; they’d have verified the mouse wasn’t working, then done an L1-A to put the workstation into PROM mode, and easily done troubleshooting from there (although admittedly, nobody thinks, “I wonder if somebody recompiled my kernel without mouse support after I went home last night”). This poor guy couldn’t do anything other than pick up his mouse to make sure the underside hadn’t gotten dirty. It turned out that he had no idea how to do anything with the workstation if he couldn’t log in via the GUI.
It proved to be a remarkably effective demonstration to management that this guy was a yahoo and needed to be fired. (Fortunately, there were plenty of suspect engineers, and management never found out who was responsible. Earl Galleher, who ran that part of the business at the time, and is the chairman at Basho now, probably still wonders… It wasn’t me, Earl.)
But it makes me wonder what is the future of all the GUI masters in IT Operations, because the world is evolving to be more like the teams that I had before I came to Gartner — systems programmers with strong systems and operations skills, who could also code.
DevOps: Now you know how to deal with the IT Operations guy who can only use a GUI…
We’re currently in the midst of agenda planning for 2012, which is a fancy way to say that we’re trying to figure out what we’re going to write next year. Probably to the despair of my managers, I am almost totally a spontaneous writer, who sits down on a plane and happens to write a research note on whatever it is that’s occurred to me at the moment. So I’ve been pondering what to write, and decided that I ought to tap into the deep well of frustration I’ve been feeling about the cloud IaaS market over the last couple of months.
Specifically, it started me in on thinking about the most common fallacies that I hear from current cloud IaaS providers, or from vendors who are working on getting into the business. I think each of these things is worthy of a research note (in some cases, I’ve already written one), but they’re also worth a blog post series, because I have the occasional desire to explode in frustrated rants. Also, when I write research, it’s carefully polite, thoughtfully-considered, heavily-nuanced, peer-reviewed documents that will run ten to twenty pages and be vaguely skimmed, often by mid-level folks in product marketing. If I write a blog post, it will be short and pointed and might actually get the point through to people, especially the executives who are more likely to read my blog than my research.
So, here’s the succinct list to be explored in further posts. These are things I have said to vendor clients in inquiries, in politely measured terms. These are the blunt versions:
Doing this cloud infrastructure thing is hard and expensive. Yes, I know that VMware told you that you could just get a VCE Vblock, put VMware’s cloud stack on it (maybe with a little help from VMware consulting), and be in business. That’s not the case. You will be making a huge number of engineering decisions (most of which can screw you in a variety of colorful ways, either immediately or down the road). You will be integrating a ton of tools and doing a bunch of software development yourself, if you want to have a vaguely competitive offering for anything other than the small business migrating from VPS. Ditto if you use Citrix (Cloud.com), OpenStack, or whomever. Even with professional services to help you. And once you have an offering, you will be in a giant competitive rat race where the best players innovate fast, and the capabilities gap widens, not closes. If you’re not up to it, white-label, resell, or broker instead.
There is more to the competition than Amazon, but ignore Amazon at your peril. Sure, Amazon is the market goliath, but if your differentiation is “we’re not like Amazon, we’re enterprise-class!”, you’re now competing against te dozens of other providers who also thought that would be a clever market differentiation. Not to mention that Amazon already serves the enterprise, and wants to deepen its inroads. (Where Amazon is hurting is the mid-market, but there’s tons of competition there, too.) Do you seriously think that Amazon isn’t going to start introducing service features targeted at the enterprise? They already have, and they’re continuing to do so.
Not everything has to be engineered to five nines of availability. Many businesses, especially those moving legacy workloads, need reliable, consistently high-performance infrastructure. Howeve, most businesses shouldn’t get infrastructure as one-size-fits-all — this is part of what is making internal data centers expensive. Instead, cloud infrastructure should be tiered — one management portal, one API, multiple levels of service at different price points. “Everything we do is enterprise-class” unfortunately implies “everything we do is expensive”.
Your contempt for the individual developer hugely limits your sales opportunities. Developers are the face of the business buyer. They are the way that cloud IaaS makes inroads into traditional businesses, including the largest enterprises. This is not just about start-ups or small businesses, or about the companies going DevOps.
Prospective customers will not call Sales when your website is useless. Your lack of useful information on your website doesn’t mean that eager prospects will call sales wanting to know what wonderful things you have. Instead, they will assume that you suck, and you don’t get the cloud, and you are hiding what you have because it’s not actually competitive, and they will move on to the dozens of other providers trying to sell cloud IaaS or who are pretending to do so. Also, engineers hate talking to salespeople. Blind RFPs are common in this market, but so is simply signing up with a provider that doesn’t make it painful to get their service.
Just because you don’t take online sign-ups doesn’t mean your cloud is “safe”. Even if you only take “legitimate businesses”, customers make mistakes and their infrastructure gets compromised. Sure, your security controls might ensure that the bad guys don’t compromise your other customers. But that doesn’t mean you won’t end up hosting command-and-control for a botnet, scammers, or spammers, inadvertently. Service providers who take credit card sign-ups are professionally paranoid about these things; buyers should beware providers who think “only real businesses like you can use our cloud” means no bad guys inside the walls.
Automation, not people, is the future. Okay, you’re more of a “managed services” kind of company, and self-service isn’t really your thing. Except “managed services” are, today, basically a codeword for “expensive manual labor”. The real future value of cloud IaaS is automating the heck out of most of the lower-end managed services. If you don’t get on that bandwagon soon, you are going to eventually stop being cost-competitive — not to mention that automation means consistency and likely higher quality. There’s a future in having people still, but not for things that are better done by computers.
Carriers won’t dominate the cloud. This opinion is controversial. Of course, carriers will be pretty significant players — especially since they’ve been buying up the leading independent cloud IaaS providers. But many other analyst firms, and certainly the carriers themselves, believe that the network, and the ability to offer an end-to-end service, will be a key differentiator that allows carriers to dominate this business. But that’s not what customers actually want. They want private networking from their carrier that connects them to their infrastructure — which they can get out of a carrier-neutral data center that is a “cloud hub”. Customers are better off going into a cloud hub with a colocated “cloud gateway” (with security, WAN optimization, etc.), cross-connecting to their various cloud providers (whether IaaS, PaaS, SaaS, etc.), and taking one private network connection home.
Stay tuned. More to come.
I’ve just finished writing the forthcoming Public Cloud IaaS Magic Quadrant (except for some anticipated tweaks when particular providers come back with answers to some questions), which has twenty providers. Although Gartner normally doesn’t do hands-on evaluations, this MQ was an exception, because the easiest way to find out if a given service can do X, was generally to get an account, and attempt to do X. Asking the vendor sometimes requires a bunch of back-and-forth, especially if they don’t do X but and are weaseling their reply, forcing you to ask a set of increasingly narrow, specific questions until you get a clear answer. Also, I did not want to constantly bombard the vendors with questions, since, come MQ time, it tends to result in a firedrill whether or not you intended the question as urgent or even particularly important. (I apologize for the fact that I ended up bombarding many vendors with questions, anyway.)
I’ve used cloud services before, of course, and I am a paying customer of two cloud IaaS providers and a hosting provider, for my personal hobbies. But there’s nothing quite like a blitzkrieg through this many providers all at once. (And I’m not quite done, because some providers without online sign-up are still getting back to me on getting a trial account.)
In the course of doing this, I have had some great experiences, some mediocre experiences, and some “you really sell this and people buy it?” experiences. I have online chatted with support folks for basic questions not covered in the documentation (like “if I stop this VM, does it stop billing me, or not?” which varies from provider to provider). I have filed numerous support tickets (for genuine issues, not for evaluation purposes). I have filed multiple bug reports. I have read documentation (sometimes scanty to non-existent). I have clicked around interfaces, and I have actually used the APIs (working in Python, and in one case, without using a library like libcloud); I have probably weirded out some vendors by doing these things at 2 am, although follow-the-sun support has been intriguing. Those of you who follow me on Twitter (@cloudpundit) have gotten little glimpses of some of these things.
Ironically, I have tried to not let these trials unduly influence my MQ evaluations, except to the extent that these things are indisputably factual — features, availability of documentation, etc. But I have taken away strong impressions about ease of use, even for just the basic task of provisioning and de-provisioning a virtual machine. There is phenomenal variation in ease of use, and many providers could really use the services of a usability expert.
Any number of these providers have made weird, seemingly boneheaded decisions in their UI or service design, for which there’s no penalty to anything in MQ scoring, but did occasionally make me stare and go, “Seriously?”
I’m reluctant to publicly call out vendors for this stuff, so I’ll pick just one example from a vendor that has open online sign-up, where it’s not a private issue that hasn’t been raised on a community forum, and they’re not the sort of vendor (I hope) to make angry calls to Gartner’s Ombudsman demanding that I take this post down. (Dear OpSource folks: Think of this as tough love, and I hope Dimension Data analyst relations doesn’t have conniptions.)
So, consider: OpSource has pre-built VMs, that come with a set amount of compute and RAM, bundled with an OS. Great. Except that you can’t alter a bundle at the time of provisioning. So, say, if I want their Ubuntu image, it comes only in a 2 CPU core config. If I want only 1 core, I have to provision that image, wait for the provision to finish, go in and edit the VM config to reduce it to 1 core, and then wait for it to restart. After I go through that song and dance once, I can clone the config… but it boggles the mind why I can’t get the config I want from the start. I’m sure there’s a good technical reason, but the provider’s job is to mask such things from the user.
The experience has also caused me to wholly revise my opinion of vCloud Director as a self-service tool for the average goomba who wants a VM. I’d always seen vCD as a demo being given by experts, where it looked like despite the pile of complex functionality, it was easy enough to use. The key thing is that the service catalogs were always pre-populated in those demos. If you’re starting from the bare vCD install that a vCloud Powered provider is going to give you, you face a daunting task. Complexity is necessary for that level of fine-grained functionality, but it’s software that is in desperate need of pre-configuration from the service provider, and quite possibly an overlay interface for Joe Average Developer.
Now we’ll see if my bank freezes my credit card for possible fraud, when I’m hit with a dozen couple-of-cents-to-a-few-dollar charges come the billing cycle — I used my personal credit card for this, not my corporate card, since Gartner doesn’t actually reimburse for this kind of work. Ironically, once I spent a bunch of time on these sites, Google and the other online ad networks have started displaying ads that consist of nothing but cloud providers, including “click here for a free trial” or “$50 credit” or whatever, but of course you can’t apply those to existing accounts, which makes every little, “hey, you’ve spent another ten cents provisioning and de-provisioning this VM” charge which I’m noting in the back of my head now, into something which will probably annoy me in aggregate come the billing cycle.
Some things, you just can’t know until you try it yourself.
I promised the attendees at my Gartner Symposium workshop, called “Using Amazon Web Services“, that I would post the notes from the session, so here they are — with some context for public consumption.
A workshop is a structured, facilitated discussions that are designed to assist participants in working through a problem, coming up with best practices, etc. This one had thirty people, all from IT organizations that were either using Amazon or planning to use Amazon.
Because I didn’t know what level of experience with Amazon the workshop attendees would have, I actually prepared two workshops in advance. One of them was a highly structured work-through of preparing to use Amazon in a more formal way (i.e., not a single developer with a credit card or the like), and the other was a facilitated sharing of challenges and best practices amongst current adopters. As the room skewed heavily towards people who already had a deployment well under way, this workshop focused on the latter.
I started the workshop with introductions — people, companies, current use cases. Then, I asked attendees to share their use cases in more details in their smaller working groups. This turned into a set of active discussions that I allowed extra time for, before I asked each of the group to make a list of their most significant challenges in adopting/using Amazon, and their solutions if any. Throughout, I circulated the room, listening and, rarely, commenting. Each group then shared their findings, and I offered some commentary and then did an open Q&A (with some more participant sharing of their answers to questions).
Broadly, I would say that we had three types of people in the room. We had folks from the public sector and education, who were at a relatively early stage in adoption; we had people who were test/dev oriented but in a significant way (i.e., formal adoption, not a handful of developers doing their thang); and we had people who were more e-business oriented (including people from net-native businesses like SaaS, as well as traditional businesses with a hosting type of need), although that could be test/dev or production. Most of the people were mid-level IT management with direct responsibility for the Amazon services.
Some key observations:
Dealing with the financial aspects of moving to the cloud is hard. Understanding the return on investment, accurately estimating costs in advance, comparing costs to internal costs, and understanding the details of billing were common challenges of the participants. Moreover, it raises the issue of “is capital king or is expense king?” Although the broader industry is constantly talking about how people are trying to move to expense rather than capital, workshop participants frequently asserted that it was easier for them to get capital than to up their recurring expenses. (As a side note, I have found that to be a frequent assertion in both inquiry and conference 1-on-1s.) Finally, user management, cost control, and turning resources on/off appropriately were problematic in the financial context.
Move low-risk workloads first. The workshop participants generally assessed Amazon as being suitable only to test/dev, non-mission-critical workloads, and things that had specifically been designed with Amazon’s characteristics in mind. Participants recommended a risk profile of apps, and moving low-risk apps first. They also saw their security organizations as being a barrier to adoption. Many had issues with their Legal departments either trying to prevent use of services or causing issues in the contracting process (what Amazon calls an Enterprise Agreement); participants recommended not involving Legal until after adopting the service.
Performance is a problem. Performance was cited as a frequent issue, especially storage performance, which participants noted was unsuitable to their production applications, and one participant made the key point that many test/dev situations also require highly performant storage (something he had first discovered when his ILM strategy placed test/dev storage at a lower more commodity tier and it impacted his developers).
Know what your SLA isn’t. Amazon’s limited SLAs were cited as an issue, particularly the mismatch in what many users thought the SLA was versus what it actually was, and what it’s actually turned out to be in practice (given Amazon’s outages this year). Participants also stressed business continuity planning in this context.
Integration is a challenge. Participants noted that going to test/dev in the cloud, while maintaining production in an internal data center, splits the software development lifecycle across data centers. This can be overcome to some degree with the appropriate tools, but still creates challenges and sometimes outright problems. Also, because speed of deployment is such a driving factor to go to the cloud, there is a resulting fragmentation of solutions. A service catalog would help some of these issues.
Data management can be a challenge. Participants were worried about regulatory compliance and the “where is my data?” question. Inexperienced participants were often not aware that non-S3 data is generally local to an availability zone. But even beyond that, there’s the question of what data is being put where by the cloud users. Participants with larger amounts of data also faced challenges in moving data in and out of the cloud.
Amazon isn’t the right provider for all workloads in the cloud. Several workshop participants used other cloud IaaS providers in addition to Amazon, for a variety of other reasons — greater ease of use for users who didn’t need complex things, enterprise-grade availability and performance, better manageability, security capabilities, and so forth.
I have conducted cloud workshops and what Gartner calls analyst/user roundtables at a bunch of our conferences now, and it’s always interesting what the different audiences think about, and how much it’s evolving over time. Compared to last year’s Symposium, the state of the art of Amazon adoption amongst conference attendees has clearly advanced hugely.