Category Archives: Infrastructure

What’s cloud IaaS really about?

As expected, the Magic Quadrant for Cloud IaaS and Web Hosting is stirring up much of the same debate that was raised with the publication of the 2009 MQ.

Derrick Harris over at GigaOM thinks we got it wrong. He writes: Cloud IaaS is about letting users get what they need, when they need it and, ideally, with a credit card. It doesn’t require requisitioning servers from the IT department, signing a contract for any predefined time period or paying for services beyond the computing resources.

Fundamentally, I dispute Derrick’s assertion of what cloud IaaS is about. I think the things he cites above are cool, and represent a critical shake-up in thinking about IT access, but it’s not ultimately what the whole cloud IaaS market is about. And our research note is targeted at Gartner’s clients — generally IT management and architects at mid-sized businesses and enterprises, along with technology start-ups of all sizes (but generally ones that are large enough to have either funding or revenue).

Infrastructure without a contract? Convenient initially, but as the relationship gets more significant, usually not preferable. In fact, most businesses like to be able to negotiate contract terms. (For that matter, Amazon does customzed Enterprise Agreements with its larger customers.) Businesses love not having to commit to capacity, but the whole market is shifting its business models pretty quickly to adapt to that desire.

Infrastructure without involving traditional IT operations? Great, but someone’s still got to manage the infrastructure — shoving it in the cloud does not remove the need for operations, maintenance, patch management, security, governance, budgeting, etc. Gartner’s clients generally don’t want random application developers plunking down a credit card and just buying stuff willy-nilly. Empower developers with self-provisioning, sure — but provisioning raw infrastructure is the easy and cheap part, in the grand scheme of things.

Paying for services beyond the computing resources? Sure, some people love to self-manage their infrastructure. But really, what most people want to do is to only worry about their application. Their real dream is that cloud IaaS provides not just compute capacity, but secure compute capacity — which generally requires handling routine chores like patch management, and dealing with anti-virus and security event monitoring and such. In other words, they want to eliminate their junior sysadmins. They’re not looking for managed hosting per se; they’re looking to get magic, hassle-free compute resources.

I obviously recognize Amazon’s contributions to the market. The MQ entry on Amazon begins with: Amazon is a thought leader; it is extraordinarily innovative, exceptionally agile and very responsive to the market. It has the richest cloud IaaS product portfolio, and is constantly expanding its service offerings and reducing its prices. But I think Amazon represents an aspect of a broad market.

Cloud IaaS is complicated by the diversity of use cases for it. Our clients are also looking for specific guidance on just the “pure cloud”, self-provisioned “virtual data center” services, so we’re doing two more upcoming vendor ratings to address that need — a Critical Capabilities note that is focused solely on feature sets, and a mid-year Magic Quadrant that will be purely focused on this.

I could talk at length about what our clients are really looking for and what they’re thinking with respect to cloud IaaS, which is a pretty complicated and interesting tangle, but I figure I really ought to write a research note for that… and get back to my holiday vacation for now.

Bookmark and Share

EdgeCast joins the ADN fray

EdgeCast has announced the beta of its new application delivery network service. For those of you who are CDN-watchers, that means it’s leaping into the fray with Akamai (Dynamic Site Accelerator and Web Application Accelerator, bundles where DSA is B2C and WAA is B2B), Cotendo’s DSA, and CDNetworks’s Dynamic Web Acceleration. EdgeCast’s technology is another TCP optimization approach, of the variety commonly used in application delivery controllers (like F5’s Web Application Accelerator) and WAN optimization controllers (like Riverbed).

EdgeCast seems to be gaining traction with my client base in the last few weeks — they’re appearing in a lot more shortlists. This appears to be the direct result of aggressive SEO-based marketing.

What’s important about this launch: EdgeCast isn’t just a standalone CDN. It is also a CDN partner to many carriers, and it is deriving an increasingly larger percentage of its revenue by selling its software. That means that EdgeCast introducing ADN services potentially has ripple effects on the ecosystem, in terms of spreading ADN technology more widely.

Bookmark and Share

Just enough privacy

For a while now, I’ve been talking to Gartner clients about what concerns keep them off public cloud infrastructure, and the diminishing differences between private and public cloud from service providers. I’ve been testing a thesis with our clients for some time, and I’ve been talking to people here at Gartner’s data center conference about it, as well.

That thesis is this: People will share a data center network, as long as there is reasonable isolation of their traffic, and they are able to get private non-Internet connectivity, and there is a performance SLA. People will share storage, as long as there is reasonable assurance that nobody else can get at their data, which can be handled via encryption of the storage at rest and in flight, perhaps in conjunction with other logical separation mechanisms, and again, there needs to be a performance SLA. But people are worried about hypervisor security, and don’t want to share compute. Therefore, you can meet most requirements for private cloud functionality by offering temporarily dedicated compute resources.

Affinity rules in provisioning can address this very easily. Simply put, the service provider could potentially maintain a general pool of public cloud compute capacity — but set a rule for ‘psuedo-private cloud’ customers that says that if a VM is provisioned on a particular physical server for customer X, then that physical server can only be used to provision more VMs for customer X. (Once those VMs are de-provisioned, the hardware becomes part of the general pool again.) For a typical customer who has a reasonable number of VMs (most non-startups have dozens, usually hundreds, of VMs), the wasted capacity is minimal, especially if live VM migration techniques are used to optimize the utilization of the physical servers — and therefore the additional price uplift for this should be modest.

That gets you public cloud compute scale, while still assuaging customer fears about server security. (Interestingly, Amazon salespeople sometimes tell prospects that you can use Amazon as a private cloud — you just have to use only the largest instances, which eat the resources of the full physical server.)

Bookmark and Share

What does the cloud mean to you?

My Magic Quadrant for Cloud Infrastructure as a Service and Web Hosting is done. The last week has been spent in discussion with service providers over their positioning and the positioning of their competitors and the whys and wherefores and whatnots. That has proven to be remarkably interesting this year, because it’s been full of angry indignation by providers claiming diametrically opposed things about the market.

Gartner gathers its data about what people want in two ways — from primary research surveys, and, often more importantly, from client inquiry, the IT organizations who are actually planning to buy things or better yet are actually buying things. I currently see a very large number of data points — a dozen or more conversations of this sort a day, much of it focused on buying cloud IaaS.

And so when a provider tells me, “Nobody in the market wants to buy X!”, I generally have a good base from which to judge whether or not that’s true, particularly since I’ve got an entire team of colleagues here looking at cloud stuff. It’s never that those customers don’t exist; it’s that the provider’s positioning has essentially guaranteed that they don’t see the deals outside their tunnel vision service.

The top common fallacy, overwhelmingly, is that enterprises don’t want to buy from Amazon. I’ve blogged previously about how wrong this is, but at some point in the future, I’m going to have to devote a post (or even a research note) to why this is one of the single greatest, and most dangerous, delusions, that a cloud provider can have. If you offer cloud IaaS, or heck, you’re a data-center-related business, and you think you don’t compete with Amazon, you are almost certainly wrong. Yes, even if your customers are purely enterprise — especially if your customers are large enterprises.

The fact of the matter is that the people out there are looking at different slices of cloud IaaS, but they are still slices of the same market. This requires enough examination that I’m actually going to write a research note instead of just blogging about it, but in summary, my thinking goes like this (crudely segmented, saving the refined thinking for a research note):

There are customers who want self-managed IaaS. They are confident and comfortable managing their infrastructure on their own. They want someone to provide them with the closest thing they can get to bare metal, good tools to control things (or an API they can use to write their own tools), and then they’ll make decisions about what they’re comfortable trusting to this environment.

There are customers who want lightly-managed IaaS, which I often think of as “give me raw infrastructure, but don’t let me get hacked” — which is to say, OS management (specifically patch management) and managed security. They’re happy managing their own applications, but would like someone to do all the duties they typically entrust to their junior sysadmins.

There are customers who want complex management, who really want soup-to-nuts operations, possibly also including application management.

And then in each of these segments, you can divide customers into those with a single application (which may have multiple components and be highly complex, potentially), and those who have a whole range of stuff that encompass more general data center needs. That drives different customer behaviors and different service requirements.

Claiming that there’s no “real” enterprise market for self-managed is just as delusional as claiming there’s no market for complex management. They’re different use cases in the same market, and customers often start out confused about where they fall along this spectrum, and many customers will eventually need solutions all along this spectrum.

Now, there’s absolutely an argument to be made that the self-managed and lightly-managed segments together represent an especially important segment of the market, where a high degree of innovation is taking place. It means that I’m writing some targeted research — selection notes, a Critical Capabilities rating of individual services, probably a Magic Quadrant that focuses specifically on this next year. But the whole spectrum is part of the cloud IaaS adoption phenomenon, and any individual segment isn’t representative of the total market evolution.

Bookmark and Share

Designing to fail

Cloud-savvy application architects don’t do things the same way that they’re done in the traditional enterprise.

Cloud applications assume failure. That is, well-architected cloud applications assume that just about anything can fail. Servers fail. Storage fails. Networks fail. Other application components fail. Cloud applications are designed to be resilient to failure, and they are designed to be robust at the application level rather than at the infrastructure level.

Enterprises, for the most part, design for infrastructure robustness. They build expensive data centers with redundant components. They buy expensive servers with dual-everything in case a component fails. They buy expensive storage and mirror their disks. And then whatever hardware they buy, they need two of. All so the application never has to deal with the failure of the underlying infrastructure.

The cloud philosophy is generally that you buy dirt-cheap things and expect they’ll fail. Since you’re scaling out anyway, you expect to have a bunch of boxes, so that any box failing is not an issue. You protect against data center failure by being in multiple data centers.

Cloud applications assume variable performance. Well-architected cloud applications don’t assume that anything is going to complete in a certain amount of time. The application has to deal with network latencies that might be random, storage latencies that might be random, and compute latencies that might be random. The principle of the distributed application of this sort is that just about anything that you’re talking to can mysteriously drop off the face of the Earth at any point in time, or at least not get back to you for a whlie.

Here’s where it gets funkier. Even most cloud-savvy architects don’t build applications this way today. This is why people howl about Amazon’s storage back-end for EBS, for instance — they’re used to consistent and reliable storage performance, and EBS isn’t built that way, and most applications are built with the assumption that seemingly local standard I/O is functionally local and therefore is totally reliable and high-performance. This is why people twitch about VM-to-VM latencies, although at least here there’s usually some application robustness (since people are more likely to architect with network issues in mind). This is the kind of problem things like Node.js were created to solve (don’t block on anything, and assume anything can fail), but it’s also a type of thinking that’s brand-new to most application architects.

Performance is actually where the real problems occur when moving applications to the cloud. Most businesses who are moving existing apps can deal with the infrastructure issues — and indeed, many cloud providers (generally the VMware-based ones) use clustering and live migration and so forth to present users with a reliable infrastructure layer. But most existing traditional enterprise apps don’t deal well with variable performance, and that’s a problem that will be much trickier to solve.

Bookmark and Share

Amazon, ISO 27001, and a correction

FlyingPenguin has posted a good critique of my earlier post about Amazon’s ISO 27001 certification.

Here’s a succinct correction:

To quote Wikipedia, ISO 27001 requires that management:

  • Systematically examine the organization’s information security risks, taking account of the threats, vulnerabilities and impacts;
  • Design and implement a coherent and comprehensive suite of information security controls and/or other forms of risk treatment (such as risk avoidance or risk transfer) to address those risks that are deemed unacceptable; and
  • Adopt an overarching management process to ensure that the information security controls continue to meet the organization’s information security needs on an ongoing basis.

ISO 27002, which details the security best practices, is not required to be used in conjunction with 27001, although this is customary. I forgot this when I wrote my post (when I was reading docs written by my colleagues on our security team, which specifically recommend the 27001 approach, in the context of 27002).

In other words: 27002 is proscriptive in its controls; 27001 is not that specific.

So FlyingPenguin is right — without the 27002, we have no idea what security controls Amazon has actually implemented.

Bookmark and Share

Amazon, ISO 27001, and some conference observations

Greetings from Gartner’s Application Architecture, Development, and Integration Summit. There are around 900 people here, and the audience is heavy on enterprise architects and other application development leaders.

One of the common themes of my interaction here has been talking to an awful lot of people who are using or have used Amazon for IaaS. They’re a different audience than the typical clients I talk to about the cloud, who are generally IT Operations folks, IT executives, or Procurement folks. The audience here is involved in assessing the cloud, and in adopting the cloud in more skunkworks ways — but they are generally not ultimately the ones making the purchasing decisions. Consequently, they’ve got a level of enthusiasm about it that my usual clients don’t share (although it correlates with the reported enthusiasm they know their app dev folks have for it). Fun conversations.

So on the heels of Amazon’s ISO 27001 certification, I thought it’d be worth jotting down a few thoughts about Amazon and the enterprise.

To start with, SAS 70 Is Not Proof of Security, Continuity or Privacy Compliance (Gartner clients only). As my security colleagues Jay Heiser and French Caldwell put it, “The SAS 70 auditing report is widely misused by service providers that find it convenient to mischaracterize the program as being a form of security certification. Gartner considers this to be a deceptive and harmful practice.” It certainly is possible for a vendor to do a great SAS 70 certification — to hold themselves to best pratices and have the audit show that they follow them consistently — but SAS 70 itself doesn’t require adherence to security best practices. It just requires you to define a set of controls, and then demonstrate you follow them.

ISO 27001, on the other hand, is a security certification standard that examines the efficacy of risk management and an organization’s security posture, in the context of ISO 27002, which is a detailed security control framework. This certification actually means that you can be reasonably assured that an organization’s security controls are actually good, effective ones.

The 27001 cert — especially meaningful here because Amazon certified its actual infrastructure platform, not just its physical data centers — addresses two significant issues with assessing Amazon’s security to date. First, Amazon doesn’t allow enterprises to bring third-party auditors into its facilities and to peer into its operations, so customers have to depend on Amazon’s own audits (which Amazon does share under certain circumstances). Second, Amazon does a lot of security secret sauce, implementing things in ways different than is the norm — for instance, Amazon claims to provide network isolation between virtual machines, but unlike the rest of the world, it doesn’t use VLANs to achieve this. Getting something like ISO 27001, which is proscriptive, hopefully offers some assurance that Amazon’s stuff constitutes effective, auditable controls.

(Important correction: See my follow-up. The above statement is not true, because we have no guarantee Amazon follows 27002.)

A lot of people like to tell me, “Amazon will never be used by the enterprise!” Those people are wrong (and are almost always shocked to hear it). Amazon is already used by the enterprise — a lot. Not necessarily always in particularly “official” ways, but those unofficial ways can sometimes stack up to pretty impressive aggregate spend. (Some of my enterprise clients end up being shocked by how much they’re spending, once they total up all the credit cards.)

And here’s the important thing: The larger the enterprise, the more likely it is that they use Amazon, to judge from my client interactions. (Not necessarily as their only cloud IaaS provider, though.) Large enterprises have people who can be spared to go do thorough evaluations, and sit on committees that write recommendations, and decide that there are particular use cases that they allow, or actively recommend, Amazon for. These are companies that assess their risks, deal with those risks, and are clear on what risks they’re willing to take with what stuff in the cloud. These are organizations — some of the largest global companies in the world — for whom Amazon will become a part of their infrastructure portfolio, and they’re comfortable with that, even if their organizations are quite conservative.

Don’t underestimate the rate of change that’s taking place here. The world isn’t shifting overnight, and we’re going to be looking at internal data centers and private clouds for many years to come, but nobody can afford to sit around smugly and decide that public cloud is going to lose and that a vendor like Amazon is never going to be a significant player for “real businesses”.

One more thing, on the subject of “real businesses”: All of the service providers who keep telling me that your multi-tenant cloud isn’t actually “public” because you only allow “real businesses”, not just anyone who can put down a credit card? Get over it. (And get extra-negative points if you consider all Internet-centric companies to not be “real businesses”.) Not only isn’t it a differentiator, but customers aren’t actually fooled by this kind of circumlocution, and the guys who accept credit cards still vet their customers, albeit in more subtle ways. You’re multi-tenant, and your customers aren’t buying as a consortium or community? Then you’re a public cloud, and to claim otherwise is actively misleading.

Bookmark and Share

Amazon’s Free Usage Tier

Amazon recently introduced a Free Usage Tier for its Web Services. Basically, you can try Amazon EC2, with a free micro-instance (specifically, enough hours to run such an instance full-time, and have a few hours left over to run a second instance, too; or you can presumably use a bunch of micro-instances part-time), and the storage and bandwidth to go with it.

Here’s what the deal is worth at current pricing, per-month:

  • Linux micro-instance – $15
  • Elastic load balancing – $18.87
  • EBS – $1.27
  • Bandwidth – $3.60

That’s $38.74 all in all, or $464.88 over the course of the one-year free period — not too shabby. Realistically, you don’t need the load-balancing if you’re running a single instance, so that’s really $19.87/month, $238.44/year. It also proves to be an interesting illustration of how much the little incremental pennies on Amazon can really add up.

It’s a clever and bold promotion, making it cost nothing to trial Amazon, and potentially punching Rackspace’s lowest-end Cloud Servers business in the nose. A single instance of that type is enough to run a server to play around with if you’re a hobbyist, or you’re a garage developer building an app or website. It’s this last type of customer that’s really coveted, because all cloud providers hope that whatever he’s building will become wildly popular, causing him to eventually grow to consume bucketloads of resources. Lose that garage guy, the thinking goes, and you might not be able to capture him later. (Although Rackspace’s problem at the moment is that their cloud can’t compete against Amazon’s capabilities once customers really need to get to scale.)

While most of the cloud IaaS providers are actually offering free trials to most customers they’re in discussions with, there’s still a lot to be said about just being able to sign up online and use something (although you still have to give a valid credit card number).

Bookmark and Share

Netflix, Akamai, and video delivery performance

Dan Rayburn’s blg post about the Akamai/Netflix relationship seems to have set off something of a firestorm, and I’ve been deluged by inquiries from Gartner Invest clients about it.

I do not want to add fuel to the fire by speculating on anything, and I have access to confidential information that prevents me from stating some facts as I know them, so for blog purposes I will stick to making some general comments about Akamai’s delivery performance for long-tail video content.

From the independent third-party testing that I’ve seen, Akamai delivers good large-file video performance, but their performance is not always superior to other major CDNs (excluding greater reach, i.e., they’re going to solidly trounce a rival who doesn’t have footprint in a given international locale, for instance). Actual performance depends on a number of factors, including the specifics of a customer’s deal with Akamai and the way that the service is configured. The testing location also matters. The bottom line is that it’s very competitive performance but it’s not, say, head and shoulders above competitors.

Akamai has, for the last few years, had a specific large-file delivery service designed to cache just the beginning of a file at the very edge, with the remainder delivered from the middle tier to the edge server, thus eliminating the obvious cache storage issues involved in, say, caching entire movies, while still preserving decent cache hit ratios. However, this has been made mostly irrelevant in video delivery by the rise in popularity of adaptive streaming techniques — if you’re thinking about Akamai’s capabilities in the Netflix or similar contexts, you should think of this as an adaptive streaming thing and not a large file delivery thing.

In adaptive streaming (pioneered by Move Networks), a video is chopped up into lots of very short chunks, each just a few seconds long, and usually delivered via HTTP. The end-consumer’s video player takes care of assembly these chunks. Based on the delivery performance of each chunk, the video player decides whether it wants to upshift or downshift the bitrate / quality of the video in real time, thus determining what the URL of the next video chunk is. This technique can also be used to switch sources, allowing, for instance, the CDN to be changed in real time based on performance, as is offered by the Conviva service. Because the video player software in adaptive streaming is generally instrumented to pay attention to performance, there’s also the possibility that it may send back performance information to the content owner, thus enabling them to get a better understanding of what the typical watcher is experiencing. Using an adaptive technique, your goal is generally to get the end-user the most “upshift” possible (i.e., sustain the highest bitrate possible), and if you can, have it delivered via the least expensive source.

When adaptive streaming is in use, that means that the first chunk of videos is now just a small object, easily cached on just about any CDN. Obviously, cache hit ratios still matter, and you will generally get higher cache hit ratios on a megaPOP approach (like Limelight) than you will with a high distributed approach (like Akamai), although that starts to get more complex when you add in server-side pre-fetching, deliberately serving content off the middle tier, and the like. So now your performance starts to boil down to footprint, cache hit ratio, algorithms for TCP/IP protocol optimization, server and network performance — how quickly and seamlessly can you deliver lots and lots of small objects? Third-party testing generally shows that Akamai benchmarks very well when it comes to small object delivery — but again, specific relative CDN performance for a specific customer is always unique.

In the end, it comes down to price/performance ratios. Netflix has clearly decided that they believe Level 3 and Limelight deliver better value for some significant percentage of their traffic, at this particular instant in time. Given the incredibly fierce competition for high-volume deals, multi-vendor sourcing, and the continuing fall in CDN prices, don’t think of this as an alteration in the market, or anything particularly long-term for the fate of the vendors in question.

Bookmark and Share

Google’s mod_pagespeed and Cotendo

Those of you who are Gartner clients know that in the last year, my colleague Joe Skorupa and I have become excited about the emergence of software-based application acceleration via page optimization approaches, as exemplified by vendors like Aptimize and Strangeloop Networks. (Clients: See Cool Vendors in Enterprise Networking, 2010.) This approach to acceleration enhances the performance of Web-based content and applications, by automatically optimizing the page output of webservers according to the best practices described in books like High Performance Web Sites by Steve Souders. Techniques of this sort include automatically combining JavaScript files (which reduces overall download time), optimizing the order of the scripts, and rewriting HTML so that the browser can display the page more quickly.

Page optimization techniques can often provide significant acceleration boosts (2x or more) even when other acceleration techniques are in use, such as a hardware ADC with acceleration module (offered as add-ons by F5 and Citrix NetScaler, for instance), or a CDN (including CDN-based dynamic acceleration). Since early this year, we’ve been warning our CDN clients that this is a critical technology development to watch and to consider adopting in their networks. It’s a highly sensible thing to deploy on a CDN, for customers doing whole site delivery; the CDN offloads the computational expense of doing the optimization (which can be significant), and then caches the result (and potentially distributing the optimized pages to other nodes on the CDN). That gets you excellent, seamless acceleration for essentially no effort on the part of the customer.

Google’s Page Speed project provides free and open-source tools designed to help site owners implement these best practices. Google has recently released an open-source module, called mod_pagespeed, for the popular Apache webserver. This essentially creates an open-source competitor to commercial vendors like Aptimize and Strangeloop. Add the module into your Apache installation, and it will automatically try to optimize your pages for you.

Now, here’s where it gets interesting for CDN watchers: Cotendo has partnered wih Google. Cotendo is deploying the Google code (modified, obviously, to run on Cotendo’s proxy caches, which are not Apache-based), in order to be able to offer the page optimization service to its customers.

I know some of you will automatically be asking now, “What does this mean for Akamai?” The answer to that is, “Losing speed trials when it’s Akamai DSA vs. Cotendo DSA + Page Speed Automatic, until they can launch a competing service.” Akamai’s acceleration service hasn’t changed much since the Netli acquisition in 2007, and the evolution in technology here has to be addressed. Page optimization plus TCP optimization is generally much faster than TCP optimization alone. That doesn’t just have pricing implications; it has implications for the competitive dynamics of the space, too.

I fully expect that page optimization will become part of the standard dynamic acceleration service offerings of multiple CDNs next year. This is the new wave of innovation. Despite the well-documented nature of these best practices, organizations still frequently ignore them when coding — and even commercial packages like Sharepoint ignore them (Sharepoint gets a major performance boost when page optimization techniques are applied, and there are solutions like Certeon that are specific to it). So there’s a very broad swathe of customers that can benefit easily from these techniques, especially since they provide nice speed boosts even in environments where the network latency is pretty decent, like delivery within the United States.

Bookmark and Share