Blog Archives

The cloud is not magic

Just because it’s in the cloud doesn’t make it magic. And it can be very, very dangerous to assume that it is.

I recently talked to an enterprise client who has a group of developers who decided to go out, develop, and run their application on Amazon EC2. Great. It’s working well, it’s inexpensive, and they’re happy. So Central IT is figuring out what to do next.

I asked curiously, “Who is managing the servers?”

The client said, well, Amazon, of course!

Except Amazon doesn’t manage guest operating systems and applications.

It turns out that these developers believed in the magical cloud — an environment where everything was somehow mysteriously being taken care of by Amazon, so they had no need to do the usual maintenance tasks, including worrying about security — and had convinced IT Operations of this, too.

Imagine running Windows. Installed as-is, and never updated since then. Without anti-virus, or any other security measures, other than Amazon’s default firewall (which luckily defaults to largely closed).

Plus, they also assumed that auto-scaling was going to make their app magically scale. It’s not designed to automagically scale horizontally. Somebody is going to be an unhappy camper.

Cautionary tale for IT shops: Make sure you know what the cloud is and isn’t getting you.

Cautionary tale for cloud providers: What you’re actually providing may bear no resemblance to what your customer thinks you’re providing.

Bookmark and Share

Hope is not engineering

My enterprise clients frequently want to know why fill-in-the-blank-cloud-IaaS only has a 99.95% SLA. “That’s more than four hours of downtime a year!” they cry. “More than twenty minutes a month! I can’t possibly live with that! Why can’t they offer anything better than that?”

The answer to that is simple: There is a significant difference between engineering and hope. Many internal IT organizations, for instance, set service-level objectives that are based on what they hope to achieve, rather than the level that the solution is engineered to achieve, and can be mathematically expected to deliver, based on calculated mean time between failures (MTBF) of each component of the service. Many organizations are lucky enough to achieve service levels that are higher than the engineered reliability of their infrastructure. IaaS providers, however, are likely to base their SLAs on their engineered reliability, not on hope.

If a service provider is telling you the SLA is 99.95%, it usually means they’ve got a reasonable expectation, mathematically, of delivering a level of availability that’s 99.95% or higher.

My enterprise client, with his data center that has a single UPS and no generator (much less dual power feeds, multiple carriers and fiber paths, etc.), with a single, non-HA, non-load-balanced server (which might not even have dual power supplies, dual NICs, etc.), will tell me that he’s managed to have 100% uptime on this application in the past year, so fie on you, Mr. Cloud Provider.

I believe that uptime claim. He’s gotten lucky. (Or maybe he hasn’t gotten lucky, but he’ll tell me that the power outage was an anomaly and won’t happen again, or that incident happened during a maintenance window so it doesn’t count.)

A service provider might be willing to offer you a higher SLA. It’s going to cost you, because once you get past a certain point, mathematically improving your reliability starts to get really, really expensive.

Now, that said, I’m not necessarily a fan of all cloud IaaS providers’ SLAs. But I encourage anyone looking at them (or a traditional hosting SLA, for that matter), to ponder the difference between engineering and hope.

Bookmark and Share

Recent research notes

Here’s a round-up of what I’ve written lately, for those of you that are Gartner clients and are following my research:

Data Center Managed Services: Regional Differences in the Move Toward the Cloud is about how the IaaS market will evolve differently in each of the major regions of the world. We’re seeing significant adoption differences between the United States, Western Europe (and Canada follows the WEU pattern), and Asia, both in terms of buyer desires and service provider evolution.

Web Hosting and Cloud Infrastructure Prices, North America, 2010 is my regular update to the state of the hosting and cloud IaaS markets, targeted at end-users (IT buyers).

Content Delivery Network Services and Pricing, 2010 is my regular update of end-user (buyer) advice, providing a brief overview of the current state of the market.

Is a Cloud Content Delivery Network Right for You? is a look at Amazon CloudFront and the other emerging “cloud CDN” services (Rackspace/Limelight, GoGrid/EdgeCast, Microsoft’s CDN for Azure, etc.). It’s a hot topic of inquiry at the moment (interestingly, mostly among Akamai customers hoping to reduce their costs).

Some of my colleagues have also recently published notes that might be of interest to those of you who follow my research. Those notes include:

Bookmark and Share

Shifting the software optimization burden

Historically, software vendors haven’t had to care too much about exactly how their software performed. Enterprise IT managers are all too familiar with the experience of buying commercial software packages and/or working with integrators in order to deliver software solutions that have turned out to consume far more hardware than was originally projected (and thus caused the overall project to cost more than anticipated). Indeed, many integrators simply don’t have anyone on hand that’s really a decent architect, and lack the experience on the operations side to accurately gauge what’s needed and how it should be configured in the first place.

Software vendors needed to fix performance issues so severe that they were making the software unusable, but they did not especially care whether a reasonably efficient piece of software was 10% or even 20% more efficient, and given how underutilize enterprise data centers typically are, enterprises didn’t necessarily care, either. It was cheaper and easier to simply throw hardware at the problem rather than to worry about either performance optimization in software, or proper hardware architecture and tuning.

Software as a service turns that equation around sharply, whether multi-tenant or hosted single-tenant. Now, the SaaS vendor is responsible for the operational costs, and therefore the SaaS vendor is incentivized to pay attention to performance, since it directly affects their own costs.

Since traditional ISVs are increasingly offering their software in a SaaS model (usually via a single-tenant hosted solution), this trend is good even for those who are running software in their own internal data centers — performance optimizations prioritized for the hosted side of the business should make their way into the main branch as well.

I am not, by the way, a believer that multi-tenant SaaS is inherently significantly superior to single-tenant, from a total cost of ownership, and total value of opportunity, perspective. Theoretically, with multi-tenancy, you can get better capacity utilization, lower operational costs, and so forth. But multi-tenant SaaS can be extremely expensive to develop. Furthermore, a retrofit of a single-tenant solution into a multi-tenant one is a software project burdened with both incredible risk and cost, in many cases, and it diverts resources that could otherwise be used to improve the software’s core value proposition. As a result, there is, and will continue to be, a significant market for infrastructure solutions that can help regular ISVs offer a SaaS model in a cost-effective way without having to significantly retool their software.

Bookmark and Share

Lightweight Provisioning != Lightweight Process

Congratulations, you’ve virtualized (or gone to public cloud IaaS) and have the ability to instantly and easily provision capacity.

Now, stop and shoot yourself in the foot by not implementing a lightweight procurement process to go with your lightweight provisioning technology.

That’s all too common of a story, and it highlights a critical aspect of movement towards a cloud (or just ‘cloudier’ concepts). In many organizations, it’s not actually the provisioning that’s expensive and lengthy. It’s the process that goes with it.

You’ll probably have heard that it can take weeks or months for an enterprise to provision a server. You might even work for an organization where that’s true. You might also have heard that it takes thousands of dollars to do so, and your organization might have a chargeback mechanism that makes that the case for your department.

Except that it doesn’t actually take that long, and it’s actually pretty darn cheap, as long as you’re large enough to have some reasonable level of automation (mid-sized businesses and up, or technology companies with more than a handful of servers). Even with zero automation, you can buy a server and have it shipped to you in a couple of days, and build it in an afternoon.

What takes forever is the procurement process, which may also be heavily burdened with costs.

When most organizations virtualize, they usually eliminate a lot of the procurement process — getting a VM is usually just a matter of requesting one, rather than going through the whole rigamarole of justifying buying a server. But the “request a VM” process can be anything from a self-service portal to something with as much paperwork headache as buying a server — and the cost-savings and the agility and efficiency that an organization gains from virtualizing is certainly dependent upon whether they’re able to lighten their process for this new world.

There are certain places where the “forever to procure, at vast expense” problems are notably worse. For instance, subsidiaries in companies that have centralized IT in the parent company often seem to get shafted by central IT — they’re likely to tell stories of an uncaring central IT organization, priorities that aren’t aligned with their own, and nonsensical chargeback mechanisms. Moreover, subsidiaries often start out much more nimble and process-light than a parent company that acquired them, which leads to the build-up of frustration and resentment and an attitude of being willing to go out on their own.

And so subsidiaries — and departments of larger corporations — often end up going rogue, turning to procuring an external cloud solution, not because internal IT cannot deliver a technology solution that meets their needs, but because their organization cannot deliver a process that meets their needs.

When we talk about time and cost savings for public cloud IaaS vs. the internal data center, we should be careful not to conflate the burden of (internal, discardable/re-engineerable) process, with what technology is able to deliver.

Note that this also means that fast provisioning is only the beginning of the journey towards agility and efficiency. The service aspects (from self-service to managed service) are much more difficult to solve.

Bookmark and Share

The convenience of not coping

There’s a lot to be said for the ability to get a server for less than the price of a stick of chewing gum.

But convenience has a price, and it’s sufficient that shared hosters, blog hosters, and other folks who make their daily pittance from infrastructure-plus-a-little-extra aren’t especially threatened by cloud infrastructure services.

For instance, I pay for WordPress to host a blog because, while I am readily capable of managing a cloud server and everything necessary to run WordPress, I don’t want to deal with it. I have better things to do with my time.

Small businesses will continue to use traditional shared hosting or even some control-panel-based VPS offerings, despite the much-inferior price-to-resource ratios compared to raw cloud servers, because of the convenience of not having to cope with administration.

The reason why cloud servers are not a significant cost savings for most enterprises (when running continuously, not burst or one-time capacity), is because administration is still a tremendous burden. It’s why PaaS offerings will gain more and more traction over time, as the platforms mature, but also why those companies that crack the code to really automating systems administration will win over time.

I was pondering this equation while contemplating the downtime of a host that I use for some personal stuff; they’ve got a multi-hour maintenance downtime this weekend. My solution to this was simple: write a script that would, shortly before shutdown time, automatically shut down my application, provision a 1.5-cent-an-hour cloud server over on Rackspace, copy the data over, and fire up the application on its new home. (Note: This was just a couple of lines of code, taking moments to type.) The only thing I couldn’t automate was the DNS changeover, since I use GoDaddy for primary DNS and they don’t have an API available for ordinary customers. But conveniently: failover, without having to disrupt my Saturday.

But I realized that I was paying, on a resource-unit equivalent, tremendously more for my regular hosting than I would for a cloud server. Mostly, I’m paying for the convenience of not thinking — for not having to deal with making sure the OS is hardened, pay attention to security advisories, patch, upgrade, watch my logs, etc. I can probably afford the crude way of not thinking for a couple of hours — blindly shutting down all ports, pretty much — but I’m not comfortable with that approach for more than an afternoon.

This is, by the way, also a key difference between the small-business folks who have one or two servers, and the larger IT organizations with dozens, hundreds, or thousands of servers. The fewer you’ve got, the less efficient your labor leverage is. The guy with the largest scale doesn’t necessarily win on cost-efficiency, but there’s definitely an advantage to getting to enough scale.

Bookmark and Share

Credit cards and EA/Mythic’s epic billing mistake

Most of us have long since overcome our fear of handing over our credit cards to Internet merchants. It’s become routine for most of us to simply do so. We buy stuff, we sign up for subscriptions, it’s just like handing over plastic anytime else. For that matter, most of us have never really thought about all that credit card data laying around in the hands of brick-and-mortar merchants with whom we do business, until the unfortunate times when that data gets mass-compromised.

Bad billing problems plague lots of organizations, but Electronic Arts (in the form of its Mythic Entertainment studio, which does the massively multiplayer online RPGs Dark Ages of Camelot and Warhammer Online) just had a major screw-up: a severe billing system error that, several days ago, repeatedly charged customers their subscription fees. Not just one extra charge, but, some users say, more than sixty. Worse still, the error reportedly affected not just current customers, but past customers. A month’s subscription is $15, but users can pre-pay for as much as a year. And these days, with credit cards so often actually being checking-account debit cards, that is often an immediate hit to the wallet. So you can imagine the impact on even users with decent bank balances, being hit by multiple charges. (Plenty of people with good-sized savings cushions only keep enough money in the checking account to cover expected bills, so you don’t have to be on the actual fiscal edge to get smacked with overdraft fees.) EA is scrambling to get this straightened out, of course, but this is every company’s worst billing nightmare, and it comes at a time when EA and its competitors are all scrambling to shift their business models online.

How many merchants that you don’t do business with any longer, but used to have recurring billing permission on your credit card, still have your credit card on file? As online commerce and micropayments proliferate, how many more merchants will store that data? (Or will PayPal, Apple’s storefronts, and other payment services rule the world?)

Bookmark and Share

Q1 2010 inquiry in review

My professional life has gotten even busier — something that I thought was impossible, until I saw how far out my inquiry calendar was being booked. As usual, my blogging has suffered for it, as has my writing output in general. Nearly all of my writing now seems to be done in airports, while waiting for flights.

The things that clients are asking me about has changed in a big way since my Q4 2009 commentary, although this is partially due to an effort to shift some of my workload to other analysts on my team, so I can focus on the stuff that’s cutting edge rather than routine. I’ve been trying to shed as much of the routine colocation and data center leasing inquiry onto other analysts as possible, for instance; reviewing space-and-power contracts isn’t exactly rocket science, and I can get the trends information I need without needing to look at a zillion individual contracts.

Probably the biggest surprise of the quarter is how intensively my CDN inquiry has ramped up. It’s Akamai and more Akamai, for the most part — renewals, new contracts, and almost always, competitive bids. With aggressive new pricing across the board, a willingness to negotiate (and an often-confusing contract structure), and serious prospecting for new business, Akamai is generating a volume of CDN inquiry for me that I’ve never seen before, and I talk to a lot of customers in general. Limelight is in nearly all of these bids, too, by the way, and the competition in general has been very interesting — particularly AT&T. Given Gartner’s client base, my CDN inquiry is highly diversified; I see a tremendous amount of e-commerce, enterprise application acceleration, electronic software delivery and whatnot, in addition to video deals. (I’ve seen as many as 15 CDN deals in a week, lately.)

The application acceleration market in general is seeing some new innovations, especially on the software end (check out vendors like Aptimize), and there will be more ADN offers will be launched by the major CDN vendors this year. The question of, “Do you really need an ADN, or can you get enough speed with hardware and/or software?” is certainly a key one right now, due to the big delta in price between pure content offload and dynamic acceleration.

By the way, if you have not seen Akamai CEO Paul Sagan’s “Leading through Adversity” talk given at MIT Sloan, you might find it interesting — it’s his personal perspective on the company’s history. (His speech starts around the 5:30 mark, and is followed by open Q&A, although unfortunately the audio cuts out in one of the most interesting bits.)

Most of the rest of my inquiry time is focused around cloud computing inquiries, primarily of a general strategic sort, but also with plenty of near-term adoption of IaaS. Traditional pure-dedicated hosting inquiry, as I mentioned in my last round-up, is pretty much dead — just about every deal has some virtualized utility component, and when it doesn’t, the vendor has to offer some kind of flexible pricing arrangement. Unusually, I’m beginning to take more and more inquiry from traditional data center outsourcing clients who are now looking at shifting their sourcing model. And we’re seeing some sharp regional trends in the evolution of the cloud market that are the subject of an upcoming research note.

Bookmark and Share

The (temporary?) transformation of hosters

Classically, hosting companies have been integrators of technology, not developers of technology. Yet the cloud world is increasingly pushing hosting companies into being software developers — companies who create competitive advantage in significant part by creating software which is used to deliver capabilities to customers.

I’ve heard the cloud IaaS business compared to the colocation market of the 1990s — the idea that you build big warehouses full of computers and you rent that compute capacity to people, comparable conceptually to renting data center space. People who hold this view tend to say things like, “Why doesn’t company X build a giant data center, buy a lot of computers, and rent them? Won’t the guy who can spend the most money building data centers win?” This view is, bluntly, completely and utterly wrong.

IaaS is functionally becoming a software business right now, one that is driven by the ability to develop software in order to introduce new features and capabilities, and to drive quality and efficiency. IaaS might not always be a software business; it might eventually be a service-and-support business that is enabled by third-party software. (This would be a reasonable view if you think that VMware’s vCloud is going to own the world, for instance.) And you can get some interesting dissonances when you’ve got some competitors in a market who are high-value software businesses vs. other folks who are mostly commodity infrastructure providers enabled by software (the CDN market is a great example of this). But for the next couple of years at least, it’s going to be increasingly a software business in its core dynamics; you can kind of think of it as a SaaS business in which the service delivered happens to be infrastructure.

To illustrate, let’s talk about Rackspace. Specifically, let’s talk about Rackspace vs. Amazon.

Amazon is an e-commerce company, with formidable retail operations skills embedded in its DNA, but it is also a software company, with more than a decade of experience under its belt in rolling out a continuous stream of software enhancements and using software to drive competitive advantage.

Amazon, in the cloud IaaS part of its Web Services division, is in the business of delivering highly automated IT infrastructure to customers. Custom-written software drives their entire infrastructure, all the way down to their network devices. Software provides the value-added enhancements that they deliver on top of the raw compute and storage, from the spot pricing marketplace to auto-scaling to the partially-automated MySQL management provided by the RDS service. Amazon’s success and market leadership depends on consistently rolling out new and enhanced features, functions, capabilities. It can develop and release software on such aggressive schedules that it can afford to be almost entirely tactical in its approach to the market, prioritizing whatever customers and prospects are demanding right now.

Rackspace, on the other hand, is a managed hosting company, built around a deep culture of customer service. Like all managed hosters, they’re imperfect, but on the whole, they are the gold standard of service, and customer service is one of the key differentiators in managed hosting, driving Rackspace’s very rapid growth over the last five years. Rackspace has not traditionally been a technology leader; historically, it’s been a reasonably fast follower implementing mainstream technologies in use by its target customers, but people, not engineering, has been its competitive advantage.

And now, Rackspace is going head to head with Amazon on cloud IaaS. It has made a series of acquisitions aimed at acquiring developers and software technology, including Slicehost, JungleDisk, and Webmail.us. (JungleDisk is almost purely a software company, in fact; it makes money by selling software licenses.) Even if they emphasize other competitive differentiation, like customer support, they’re still in direct competition with Amazon on pure functionality. Can Rackspace obtain the competencies it will need to be a software leader?

And in related questions: Can the other hosters who eschew the VMware vCloud route manage to drive the featureset and innovation they’ll need competitively? Will vCloud be inexpensive enough and useful enough to be widely adopted by hosters, and if it is, how much will it commoditize this market? What does this new emphasis upon true development, not just integration, do to hosters and to the market as a whole? (I’ve been thinking about this a lot, lately, although I suspect it’ll go into a real research note rather than a blog post.)

Bookmark and Share

Cloudkick launches a commercial service

Cloudkick, a start-up which has been offering free multi-cloud management and monitoring services in preview (and which is the originator and sponsor of the open-source libcloud project), has launched its commercial offering.

Quite a bit has been written about Cloudkick in other venues, so I’ll offer a musing of a different sort: I am contemplating the degree to which cloud-agnostic monitoring service providers will evolve into general monitoring SaaS vendors. A tiny fraction of cloud IaaS users will actually be significantly multi-cloud, whereas a far vaster addressable market exists for really excellent monitoring tools, including cost-effective ways of doing third-party monitoring for the purposes of cloud SLA enforcement. Even though enterprises are likely to extend their own internal monitoring systems into their cloud environments, there will continue to be a need for third-party monitoring, and for many organizations, third-party monitoring that can also feed alerts into internal monitoring systems will be a popular choice.

Cloudkick has been interesting in the context of the debate over alleged capacity issues on Amazon EC2. Their monitoring has been showing latency issues growing in severity since Christmas or so. The public nature of this data, among other things, has pushed Amazon into making a statement that they don’t have overcapacity problems; it’s an interesting example of how making such data openly available can bring pressure to bear on service providers.

Bookmark and Share