I had an hour to kill today after a client didn’t show for a call… everyone’s taken off for T-day, I guess.
Since Rackspace has a 30-day money-back guarantee on Mosso at the moment, along with a nice Thanksgiving discount making the first month just $20, I decided to sign up for an actual account, on my personal dime. It allows me to offer guilt-free commentary based on real experience, and the freedom to bug customer support for things with the recognition that I am actually giving the company my money, and am therefore entitled to ask whatever questions I want, without giving Analyst Relations a headache. So here’s a little ramble of liveblogging my Mosso experience.
The first hurdle I ran into is that there’s no easy way to take your Cloud Files account and sign up for Cloud Sites (i.e., the main Mosso service). After a bit of a chat with their live online sales, and a few minutes of waiting while the guy checked around (during which I started writing this blog entry). After a while, I was informed I could put in a support ticket and they’d take care of it on the back end. I decided to save them some trouble and just get another account (thus allowing me to do some future playing about with copying things between Cloud Files accounts, in my desire to create parallel cloud utilities to sftp and scp), but it was a bit of an oddity — Sites is a logical upsell from Files, so I presume that functionality is probably coming eventually.
Next, I went to look for a way to change my initial sign-up password. Nothing obvious in the interface, nothing on the knowledge base… I shrugged and provisioned myself a site. On the site’s config, I found the way to change the password — and also discovered, to my horror, that the password shows up in cleartext. That certainly prompted me to change my password immediately.
I did not want to transfer my domain, but the site info page shows what Mosso wants the DNS records to be; I adjusted the DNS records on my end for what I needed, no problem. I also provisioned a second site with a non-www hostname (note that Mosso automatically turns domain.com into http://www.domain.com), which worked fine and intelligently (a recent change, I guess, because when I tried a demo account last week, it insisted on spewing full DNS info, rather than just an A record, for that).
I looked at what was available for PHP, and realized that if I wanted a framework like Zend, I’d have to install it myself, and without SSH access, that looked like it was going to be a festival of non-fun, if not flat-out impossible.
So, I turned on CGI support, which seemed to take two rounds of saving my settings, on both sites I tried it on. But CGI support does not seem to actually work for me — it’s returning 404 errors on my uploaded test scripts. Perhaps this is part of the “you may need to wait two hours” warning message given on the change-config page, but it sure would be nice if it said “pending” if that were the case, or otherwise gave some hint as to what requires a wait and what doesn’t.
I’m going to wait and see, but it’s become clear that I can’t actually do what I want with Mosso, because of the following: If you’re not running a directly supported environment (PHP, Ruby on Rails, ASP.NET), you are stuck with shoving your code, in the form of scripts, into the cgi-bin directory and that’s that. The perl and python support is just generic CGI script support. So there’s no support for mod_python, and therefore you can’t run Django.
The Mosso “Is it a fit?” page implies too much, I think. The page lists “application frameworks”, and should probably more properly say “CGI scripts (Perl, Python)” rather than the implication that the perl and python support is in the form of actual application frameworks, which you’d normally expect to be something like Catalyst for perl, or Django for python.
It’s making me think about the very, very fuzzy definition for what it means to be an application platform as a service (APaaS) vendor.
I had some time to kill on a train today, and I amused myself by trying out the API for Cloud Files.
I decided I’d build something mildly useful: a little shell-like wrapper and accompanying tools for dealing with Cloud Files from the command line. So I cranked out a quick “ls”-like utility and a little interactive shell chrome around it, with the idea that my little “cfls” script would be handy for listing Cloud Files embedded in another shell script (like a quick script typed in on the command line), and that I’d be able to do more interesting manipulations from within the interactive “shell”. I decided to do it in Python, since of the languages that the API was available in, that’s the one I’m most comfortable with; I’m not a great Python programmer but I can manage. (It’s been more than a decade since I’ve written code for pay, and it probably shows.)
I was reasonably pleased by the fruits of my efforts; the API was pleasantly effortless to work with for these kind of minor tasks, and I have a version 0.1. I got the shell core and the cfls utility working, and for the heck of it, I made direct API calls out of the Python interactive mode to upload my source code to Cloud Files. For nearly no time investment, this seems pretty satisfying.
The only annoying quirk that I discovered was that the containers in the results set of get_all_containers() do not get their instance variables fully populated — all that’s populated is the name. (Calling it results in constructing a ContainerResults object with list_containers(), and the iterator only populates each Container generated with its name.) So it seems like you have to call list_containers() to get all the container names, and then get_container() on each container name, if you actually need the instance variables. I also had some odd unexpected exceptions thrown when testing things out in the Python shell — related to time-outs on the connection object, I assume. Still, these are not problems that cause more than a moment’s sigh.
The Cloud Files Python library is far and away better than the Amazon S3 Python library, which seems much more like demonstration code than a real working library (which is probably why people doing things with AWS and Python tend to use Boto instead). The Cloud Files module for Python is decently if sparsely documented, but its usage is entirely self-evident. It’s simply and intelligently structured, in a logical object hierarchy.
The important point: It’s trivial for any idiot to build apps using Cloud Files.
I’m now trialing three different basic CDNs, having each of them deliver exactly one image off the same front page of a website: Amazon’s CloudFront, Rackspace’s Cloud Files CDN (a Limelight Networks partnership), and SimpleCDN.
SimpleCDN has turned out to be the overwhelming winner in terms of initial set-up — you can simply point it at a site and it will mirror your content, just like you’ll find in more full-featured CDNs. The downside is that it has basically no additional controls — from what I can tell, you can’t even delete a mirror once you set it up, and there’s no way to explicitly mark content with a Cache-Control header or the like. It also wins on the sheer price of data transfer — just a hair under 6 cents per GB. The downside is that, at that price, it doesn’t come close come to touching Limelight’s footprint, or, for that matter, Amazon’s — there are no East Coast delivery locations, for instance. It’s also routed via AnyCast.
Cloud Files and SimpleCDN are both missing meaningful logging capabilities. Cloud Files can give me daily aggregate totals of disk space used, bandwidth in and out, and number of paid and free operations. (Just numbers. No graph.) SimpleCDN can give me aggregated traffic graphs (outbound bandwidth, hits per second, and hits per 15 minutes) for the past 2 days and the past 2 weeks, plus aggregate totals for the last 30 minutes, day, and week.
With Amazon, you get detailed usage reports, modulo the fact that they are indecipherably opaque. You can set up S3 to give you detailed server logs; I’m processing mine through S3stat, which is a service that will Webalizer your S3 logs for you. Amazon is promising such logs for CloudFront in the future. At the moment, I’m stuck with the enigmatic usage reports. Nothing I can find anywhere will tell me what the difference between a tier 1 and tier 2 request is, for instance. What I’m interested in finding out is what percentage of my requests end up falling back to the origin, but it looks like that is a mystery that will have to wait for CloudFront logging.
Lack of logging is not particularly a big deal if you are just trying to offload static content in the form of images and the like — presumably in that scenario you have decent analytics based off hits to the base page or a tracking counter or something like that. However, if you’re trying to track something like software downloads, it is certainly a much more significant problem. And importantly, logs let you verify exactly what is going on, which may be significant for troubleshooting as well as settling billing disputes.
It’s official: Rackspace’s Cloud Files can now be distributed via Limelight’s CDN.
The Cloud Files CDN works exactly like announced: drop your files into Cloud Files, push a button, have them served over the Limelight CDN. It’s 22 cents flat-rate, without origin fees. I’ve discussed this deal before, including in the context of Amazon’s CloudFront, and my previous commentary stands.
What will be interesting to see is the degree to which Limelight preserves infrequently-accessed content in its edge caches, vs. Amazon, as that will make a significant impact on performance, as well as cost-effectiveness vs. CloudFront. Limelight has a superior footprint, superior peering relationships, and much more sophisticated content routing. For people who are just looking to add a little bit of effortless CDN, their ability to keep infrequently-accessed content fresh on their edge caches will determine just how much of a performance advantage they have — or don’t have — versus CloudFront.
A Little Testing
In the spirit of empirical testing, I’ve created a Cloud Files account, too, and am now delivering the images of the front page of a tiny website from a combination of CloudFront and Cloud Files (which is under Rackspace’s Mosso brand).
It looks like my Cloud Files are being served from the same Limelight footprint as other Limelight customers, at least to the extent that I get the exact same edge resolutions for http://www.dallascowboys.com as I do for cdn.cloudfiles.mosso.com.
The Cloud Files deployment is reasonably elegant, and Mosso provides a GUI and thus is more directly user-friendly. You create a container (folder), drop your files into it, and click a button to make the files public, making them automatically and instantly available via Limelight. (The Amazon process is slower, although still within minutes.)
However, in a move that’s distinctly inferior to the CloudFront implementation, the base URL for the files is http://cdn.cloudfiles.mosso.com/some-hash-value/yourfile, vs. the more elegant CloudFront http://some-hash-value.cloudfront.net/yourfile. You can CNAME the latter to some nicely in-your-domain host (images.yourcompany.com, let’s say), and in fact, if you’ve used another CDN or have simply been smart and delivered static content off its own hostname, you’d already have that set up. The Mosso method means that you’ve got to go through all your content and munge your URLs to point to the CDN.
Aside from being annoying, it makes the elegant fallback, less elegant. Having the CNAME means that if some reason CloudFront goes belly-up, I can trivially repoint the CNAME to my own server (and I can easily automate this task, monitoring CloudFront deliverability and swapping the DNS if it dies). I can’t do that as easily with Mosso (although I can CNAME cdn.cloudfiles.mosso.com to something in my domain, and set up a directory structure that uses the hash, or use a URL rewriting rule, and still get the same effect, so it is merely more awkward, not impossible).
That might be in part a way for Limelight to differentiate this from its normal CDN offering, which has the modern and elegant deployment mechanism of: simply point your website hostname at Limelight and Limelight will do the rest. In that scheme, Limelight automatically figures out what to cache, and does the right thing. That seems unlikely, though. I hope the current scheme is just temporary until they get a CloudFront-like DNS-based implementation done.
I’ll have more to say once I’ve accumulated some log data, and add SimpleCDN into the mix.
Amazon’s previously-announced CDN service is now live. It’s called CloudFront. It was announced on the Amazon Web Services blog, and discussed in a blog post by Amazon’s CTO. The RightScale blog has a deeper look, too.
How CloudFront Works
The tasks use Amazon API calls, but like every other useful bit of AWS functionality, there are tools out there that can make it point-and-click, so this can readily be CDN for Dummies. There’s no content “publication”, just making sure that the URLs for your static content point to Amazon. Intelligently-designed sites already have static content segregated onto its own hostname, so it’s a simple switch for them; for everyone else, it’s just some text editing, and it’s the same method as just about every other CDN for the better part of the last decade.
In short, you’re basically using S3 as the origin server. The actual delivery is through one of 14 edge locations — in the particular case of these locations, a limited footprint, but not a bad one in these days of the megaPOP CDN.
How Pricing Works
Pricing for the service is a little more complex to understand. If your content is being served from the edge, it’s priced similarly to S3 data transfers (same price at 10 TB or less, and 1 cent less for higher tiers). However, before your content can get served off the edge, it has to get there. So if the edge has to go back to the origin to get the content (i.e., the content is not in cache or has expired from cache), you’ll also incur a normal S3 charge.
You can think of this as being similar to the way that Akamai charged a few years ago — you’d pay a bandwidth fee for delivering content to users, but you’d also pay something called an “administrative bandwidth fee”, which was charged for edge servers fetching from the origin. That essentially penalized you when there was a cache miss. On a big commercial CDN like Akamai, customers reasonably felt this was sort of unfair, competitors like Mirror Image hit back by not charging those fees, and by and large the fee disappeared from the market.
It makes sense for Amazon to charge for having to go back to the (S3) origin, though, because the open-to-all-comers nature of CloudFront means that they would naturally have a ton of customers who have long-tail content and therefore very low cache hit ratios. On a practical basis, however, quite a few people who try out CloudFront will probably discover that the cache miss ratio for their content is too high; since a cache miss also confers no performance benefit, the higher delivery cost won’t make sense, and they’ll go back to just using S3 itself for static delivery. In other words, the pricing scheme will make customers self-regulate, over time.
Performance and CDN Server Misdirection
Some starting fodder for the performance-minded: My preliminary testing shows that Amazon’s accuracy in choosing an appropriate node can be extremely poor. I’ve noted previously that the nameserver is the location reference, and so the CDN’s content-routing task is to choose a content server close to that location (which will also hopefully be close to the end-user).
I live in suburban DC. I tested using cdn.wolfire.com (a publicly-cited CloudFront customer), and checked some other CloudFront customers as well to make sure it wasn’t a domain-specific aberration. My local friend on Verizon FIOS, using a local DC nameserver, gets Amazon’s node in Northern Virginia (suburban DC), certainly the closest node to us. Using the UltraDNS nameserver in the New York City area, I also get the NoVa node — but from the perspective of that nameserver, Amazon’s Newark node should actually be closer. Using the OpenDNS nameserver in NoVa, Amazon tries to send me to its CDN node in Palo Alto (near San Francisco) — not even close. My ISP, MegaPath, has a local nameserver in DC; using that, Amazon sends me to Los Angeles.
That’s a serious set of errors. By ping time, the correct, NoVa node is a mere 5 ms away from me. The LA node is 71 ms from me. Palo Alto is even worse, at 81 ms. Both of those times are considerably worse than pure cross-country latency across a decent backbone. My ping to the test site’s origin (www.wolfire.com) is only 30 ms, making the CDN latency more than twice as high.
However, I have a certain degree of faith: Over time, Amazon has clearly illustrated that they learn from their mistakes, they fix them, and they improve on their products. I assume they’ll figure out proper redirection in time.
What It Means for the Market
Competitively, it seems like Rackspace’s Cloud Files plus Limelight may turn out to be the stronger offering. The price of Rackspace/Limelight is slightly higher, but apparently there’s no origin retrieval charge, and Limelight has a broader footprint and therefore probably better global performance (although there are many things that go into CDN performance beyond footprint). And while anyone can use CloudFront for the teensiest bit of delivery, the practical reality is that they’re not going to — the pricing scheme will make that irrational.
Some people will undoubtedly excitedly hype CloudFront as a game-changer. It’s not. It’s certainly yet another step towards having ubiquitous edge delivery of all popular static content, and the prices are attractive at low to moderate volumes, but high-volume customers can and will get steeper discounts from established players with bigger footprints and a richer feature set. It’s a logical move on Amazon’s part, and a useful product that’s going to find a large audience, but it’s not going to majorly shake up the CDN industry, other than to accelerate the doom of small undifferentiated providers who were already well on their way to rolling into the fiery pit of market irrelevance and insolvency.
(I can’t write a deeper market analysis on my blog. If you’re a Gartner client and want to know more, make an inquiry and I’d be happy to discuss it.)
Last week, Rackspace made a bunch of announcements about its cloud strategy. I wrote previously about its deal with Limelight; now I want to contemplate its two acquisitions, Jungle Disk and Slicehost. (I have been focused on writing research notes in the last week, or I would have done this sooner…)
Jungle Disk provides online storage, via Amazon S3. Its real strength is in its easy-to-use interface; you can make your Jungle Disk storage look like a network drive, it has automated backup into the cloud, and there are premium features like Web-based (WebDAV) access. Files are store encrypted. You pay for their software, then pay the S3 charges; there’s only a monthly recurring from them if you get their “plus” service. The S3 account is yours, so if you decide to dump Jungle Disk, you can keep using your storage.
The Jungle Disk acquisition looks like a straightforward feature addition — it’s a value-add for Rackspace’s Cloud Files offering, and Rackspace has said that Jungle Disk will be offering storage on both platforms. It’s a popular brand in the S3 backup space, and it’s a scrappy little self-funded start-up.
I suspect Rackspace likes scrappy little self-funded start-ups. The other acquisition, Slicehost, is also one. At this point, outright buying smart and ambitious entrepreneurial engineers with cloud experience is not a bad plan for Rackspace, whose growth has already resulted in plenty of hiring challenges.
Slicehost is a cloud hosting company. They offer unmanaged Linux instances on a Xen-based platform; their intellectual property comes in the form of their toolset. What’s interesting about this acquisition is that this kind of “server rental” — for $X per month, you can get server hardware (except this time it’s virtual rather than physical) — is actually akin to Rackspace’s old ServerBeach business (sold to Peer 1 back in 2004), not to Rackspace’s current managed hosting business.
Rackspace got out of the ServerBeach model because it was fundamentally different from their “fanatical support” desires, and because it has much less attractive returns on invested capital. The rental business offers a commodity at low prices, where you hope that nobody calls you because that’s going to eat your margin on the deal; you are ultimately just shoving hardware at the customer. What Rackspace’s managed hosting customers pay for is to have their hands held. The Slicehost model is the opposite of that.
Cloud infrastructure providers, hope, of course, that they’ll be able to offer enough integrated value-adds on top of the raw compute to earn higher margins, and gain greater stickiness. It’s clear that Rackspace wants to be a direct competitor to Amazon (and companies like Joyent). Now the question is exactly how they’re going to reconcile that model with the fanatical support model, not to mention their ROIC model.
Rackspace announced yesterday, as part of a general unveiling of its cloud strategy, a new partnership with Limelight Networks.
Under the new partnership, customers of Rackspace’s Cloud Files (formerly CloudFS) service — essentially, a competitor to Amazon S3 — will be able to choose to publish and deliver their files via Limelight’s CDN. Essentially, this will place Rackspace/Limelight in direct competition with Amazon’s forthcoming S3 CDN.
CDN delivery won’t cost Cloud Files customers any more than Rackspace’s normal bandwidth costs for Cloud Files. Currently, that’s $0.22/GB for the first 5 TB, scaling down to $0.15/GB for volumes above 50 TB. Amazon S3, by comparison, is $0.17/GB for the first 10 TB, down to $0.10/GB for volumes over 150 TB; we don’t yet know what its CDN upcharge, if any, will be. As another reference point, Internap resold via SoftLayer is $0.20/GB, so we can probably take that as a reasonable benchmark for the base entry cost of CDN services sold without any commit.
It’s a reasonably safe bet that Limelight’s CDN is going to deliver better performance than Amazon’s S3 CDN, given its broader footprint and peering relationships, so the usual question of, “What’s the business value of performance?” will apply.
It’s a smart move on Rackspace’s part, and an easy way into a CDN upsell strategy for its regular base of hosting customers, too. And it’s a good way for Limelight to pre-emptively compete against the Amazon S3 CDN.