Monthly Archives: November 2008
I had an hour to kill today after a client didn’t show for a call… everyone’s taken off for T-day, I guess.
Since Rackspace has a 30-day money-back guarantee on Mosso at the moment, along with a nice Thanksgiving discount making the first month just $20, I decided to sign up for an actual account, on my personal dime. It allows me to offer guilt-free commentary based on real experience, and the freedom to bug customer support for things with the recognition that I am actually giving the company my money, and am therefore entitled to ask whatever questions I want, without giving Analyst Relations a headache. So here’s a little ramble of liveblogging my Mosso experience.
The first hurdle I ran into is that there’s no easy way to take your Cloud Files account and sign up for Cloud Sites (i.e., the main Mosso service). After a bit of a chat with their live online sales, and a few minutes of waiting while the guy checked around (during which I started writing this blog entry). After a while, I was informed I could put in a support ticket and they’d take care of it on the back end. I decided to save them some trouble and just get another account (thus allowing me to do some future playing about with copying things between Cloud Files accounts, in my desire to create parallel cloud utilities to sftp and scp), but it was a bit of an oddity — Sites is a logical upsell from Files, so I presume that functionality is probably coming eventually.
Next, I went to look for a way to change my initial sign-up password. Nothing obvious in the interface, nothing on the knowledge base… I shrugged and provisioned myself a site. On the site’s config, I found the way to change the password — and also discovered, to my horror, that the password shows up in cleartext. That certainly prompted me to change my password immediately.
I did not want to transfer my domain, but the site info page shows what Mosso wants the DNS records to be; I adjusted the DNS records on my end for what I needed, no problem. I also provisioned a second site with a non-www hostname (note that Mosso automatically turns domain.com into http://www.domain.com), which worked fine and intelligently (a recent change, I guess, because when I tried a demo account last week, it insisted on spewing full DNS info, rather than just an A record, for that).
I looked at what was available for PHP, and realized that if I wanted a framework like Zend, I’d have to install it myself, and without SSH access, that looked like it was going to be a festival of non-fun, if not flat-out impossible.
So, I turned on CGI support, which seemed to take two rounds of saving my settings, on both sites I tried it on. But CGI support does not seem to actually work for me — it’s returning 404 errors on my uploaded test scripts. Perhaps this is part of the “you may need to wait two hours” warning message given on the change-config page, but it sure would be nice if it said “pending” if that were the case, or otherwise gave some hint as to what requires a wait and what doesn’t.
I’m going to wait and see, but it’s become clear that I can’t actually do what I want with Mosso, because of the following: If you’re not running a directly supported environment (PHP, Ruby on Rails, ASP.NET), you are stuck with shoving your code, in the form of scripts, into the cgi-bin directory and that’s that. The perl and python support is just generic CGI script support. So there’s no support for mod_python, and therefore you can’t run Django.
The Mosso “Is it a fit?” page implies too much, I think. The page lists “application frameworks”, and should probably more properly say “CGI scripts (Perl, Python)” rather than the implication that the perl and python support is in the form of actual application frameworks, which you’d normally expect to be something like Catalyst for perl, or Django for python.
It’s making me think about the very, very fuzzy definition for what it means to be an application platform as a service (APaaS) vendor.
Akamai is a software company. It is not, fundamentally, an infrastructure company. You could blow up their network of servers from orbit, and most of the value of the company would be preserved. It’s a company that runs on its intellectual property, the breadth and depth of its feature set, and, ultimately, its ability to rapidly innovate software features. Like any SaaS company, it needs infrastructure upon which to deliver those features (which are mostly focused around content delivery), but the infrastructure itself is not really its value.
I used to be able to generalize this to the CDN market as a whole. Like most software markets, there was feature-set competition and the generality that what you invented today, your competitor would try to replicate in the next year.
I’ve come to the realization that this is no longer an accurate characterization. It’s true of Akamai and a very small number of competitors, but for everyone else, CDN has become an infrastructure play — about having a network and a lot of servers and just enough software to control it all efficiently.
Software and infrastructure businesses have very different characteristics, and the shift has a cascade of implications.
I had some time to kill on a train today, and I amused myself by trying out the API for Cloud Files.
I decided I’d build something mildly useful: a little shell-like wrapper and accompanying tools for dealing with Cloud Files from the command line. So I cranked out a quick “ls”-like utility and a little interactive shell chrome around it, with the idea that my little “cfls” script would be handy for listing Cloud Files embedded in another shell script (like a quick script typed in on the command line), and that I’d be able to do more interesting manipulations from within the interactive “shell”. I decided to do it in Python, since of the languages that the API was available in, that’s the one I’m most comfortable with; I’m not a great Python programmer but I can manage. (It’s been more than a decade since I’ve written code for pay, and it probably shows.)
I was reasonably pleased by the fruits of my efforts; the API was pleasantly effortless to work with for these kind of minor tasks, and I have a version 0.1. I got the shell core and the cfls utility working, and for the heck of it, I made direct API calls out of the Python interactive mode to upload my source code to Cloud Files. For nearly no time investment, this seems pretty satisfying.
The only annoying quirk that I discovered was that the containers in the results set of get_all_containers() do not get their instance variables fully populated — all that’s populated is the name. (Calling it results in constructing a ContainerResults object with list_containers(), and the iterator only populates each Container generated with its name.) So it seems like you have to call list_containers() to get all the container names, and then get_container() on each container name, if you actually need the instance variables. I also had some odd unexpected exceptions thrown when testing things out in the Python shell — related to time-outs on the connection object, I assume. Still, these are not problems that cause more than a moment’s sigh.
The Cloud Files Python library is far and away better than the Amazon S3 Python library, which seems much more like demonstration code than a real working library (which is probably why people doing things with AWS and Python tend to use Boto instead). The Cloud Files module for Python is decently if sparsely documented, but its usage is entirely self-evident. It’s simply and intelligently structured, in a logical object hierarchy.
The important point: It’s trivial for any idiot to build apps using Cloud Files.
I’m now trialing three different basic CDNs, having each of them deliver exactly one image off the same front page of a website: Amazon’s CloudFront, Rackspace’s Cloud Files CDN (a Limelight Networks partnership), and SimpleCDN.
SimpleCDN has turned out to be the overwhelming winner in terms of initial set-up — you can simply point it at a site and it will mirror your content, just like you’ll find in more full-featured CDNs. The downside is that it has basically no additional controls — from what I can tell, you can’t even delete a mirror once you set it up, and there’s no way to explicitly mark content with a Cache-Control header or the like. It also wins on the sheer price of data transfer — just a hair under 6 cents per GB. The downside is that, at that price, it doesn’t come close come to touching Limelight’s footprint, or, for that matter, Amazon’s — there are no East Coast delivery locations, for instance. It’s also routed via AnyCast.
Cloud Files and SimpleCDN are both missing meaningful logging capabilities. Cloud Files can give me daily aggregate totals of disk space used, bandwidth in and out, and number of paid and free operations. (Just numbers. No graph.) SimpleCDN can give me aggregated traffic graphs (outbound bandwidth, hits per second, and hits per 15 minutes) for the past 2 days and the past 2 weeks, plus aggregate totals for the last 30 minutes, day, and week.
With Amazon, you get detailed usage reports, modulo the fact that they are indecipherably opaque. You can set up S3 to give you detailed server logs; I’m processing mine through S3stat, which is a service that will Webalizer your S3 logs for you. Amazon is promising such logs for CloudFront in the future. At the moment, I’m stuck with the enigmatic usage reports. Nothing I can find anywhere will tell me what the difference between a tier 1 and tier 2 request is, for instance. What I’m interested in finding out is what percentage of my requests end up falling back to the origin, but it looks like that is a mystery that will have to wait for CloudFront logging.
Lack of logging is not particularly a big deal if you are just trying to offload static content in the form of images and the like — presumably in that scenario you have decent analytics based off hits to the base page or a tracking counter or something like that. However, if you’re trying to track something like software downloads, it is certainly a much more significant problem. And importantly, logs let you verify exactly what is going on, which may be significant for troubleshooting as well as settling billing disputes.
In the meantime, they’re taking trial sign-ups. I signed up. Telnic promptly emailed me a registration confirmation, sending my username and password to me in cleartext.
This company would like me to believe that they can be trusted to keep my confidential personal (or business) information secure…
It’s official: Rackspace’s Cloud Files can now be distributed via Limelight’s CDN.
The Cloud Files CDN works exactly like announced: drop your files into Cloud Files, push a button, have them served over the Limelight CDN. It’s 22 cents flat-rate, without origin fees. I’ve discussed this deal before, including in the context of Amazon’s CloudFront, and my previous commentary stands.
What will be interesting to see is the degree to which Limelight preserves infrequently-accessed content in its edge caches, vs. Amazon, as that will make a significant impact on performance, as well as cost-effectiveness vs. CloudFront. Limelight has a superior footprint, superior peering relationships, and much more sophisticated content routing. For people who are just looking to add a little bit of effortless CDN, their ability to keep infrequently-accessed content fresh on their edge caches will determine just how much of a performance advantage they have — or don’t have — versus CloudFront.
A Little Testing
In the spirit of empirical testing, I’ve created a Cloud Files account, too, and am now delivering the images of the front page of a tiny website from a combination of CloudFront and Cloud Files (which is under Rackspace’s Mosso brand).
It looks like my Cloud Files are being served from the same Limelight footprint as other Limelight customers, at least to the extent that I get the exact same edge resolutions for www.dallascowboys.com as I do for cdn.cloudfiles.mosso.com.
The Cloud Files deployment is reasonably elegant, and Mosso provides a GUI and thus is more directly user-friendly. You create a container (folder), drop your files into it, and click a button to make the files public, making them automatically and instantly available via Limelight. (The Amazon process is slower, although still within minutes.)
However, in a move that’s distinctly inferior to the CloudFront implementation, the base URL for the files is http://cdn.cloudfiles.mosso.com/some-hash-value/yourfile, vs. the more elegant CloudFront http://some-hash-value.cloudfront.net/yourfile. You can CNAME the latter to some nicely in-your-domain host (images.yourcompany.com, let’s say), and in fact, if you’ve used another CDN or have simply been smart and delivered static content off its own hostname, you’d already have that set up. The Mosso method means that you’ve got to go through all your content and munge your URLs to point to the CDN.
Aside from being annoying, it makes the elegant fallback, less elegant. Having the CNAME means that if some reason CloudFront goes belly-up, I can trivially repoint the CNAME to my own server (and I can easily automate this task, monitoring CloudFront deliverability and swapping the DNS if it dies). I can’t do that as easily with Mosso (although I can CNAME cdn.cloudfiles.mosso.com to something in my domain, and set up a directory structure that uses the hash, or use a URL rewriting rule, and still get the same effect, so it is merely more awkward, not impossible).
That might be in part a way for Limelight to differentiate this from its normal CDN offering, which has the modern and elegant deployment mechanism of: simply point your website hostname at Limelight and Limelight will do the rest. In that scheme, Limelight automatically figures out what to cache, and does the right thing. That seems unlikely, though. I hope the current scheme is just temporary until they get a CloudFront-like DNS-based implementation done.
I’ll have more to say once I’ve accumulated some log data, and add SimpleCDN into the mix.
I’ve got a tiny low-traffic site that I’ve just set up to use Amazon’s CloudFront CDN. I chose to do it the trivial, painless way for now — through RightScale’s dashboard. It just took a couple of minutes, most of which was waiting for systems to recognize the changes. I frankly expect that my content will never be fresh in cache, but doing this will give me something to play with, without actually costing me some unpredictable amount of money. I’ve been meaning to do some similar tinkering with SimpleCDN, too, and eventually with the Rackspace/Limelight cloud CDN (which thus far lacks a snappy short name).
I still have to finish my cloud server testing, too, which I started a few months ago and which keeps being interrupted by other work and personal demands… I always feel a bit guilty keeping demo accounts for long periods of time.
Amazon’s previously-announced CDN service is now live. It’s called CloudFront. It was announced on the Amazon Web Services blog, and discussed in a blog post by Amazon’s CTO. The RightScale blog has a deeper look, too.
How CloudFront Works
The tasks use Amazon API calls, but like every other useful bit of AWS functionality, there are tools out there that can make it point-and-click, so this can readily be CDN for Dummies. There’s no content “publication”, just making sure that the URLs for your static content point to Amazon. Intelligently-designed sites already have static content segregated onto its own hostname, so it’s a simple switch for them; for everyone else, it’s just some text editing, and it’s the same method as just about every other CDN for the better part of the last decade.
In short, you’re basically using S3 as the origin server. The actual delivery is through one of 14 edge locations — in the particular case of these locations, a limited footprint, but not a bad one in these days of the megaPOP CDN.
How Pricing Works
Pricing for the service is a little more complex to understand. If your content is being served from the edge, it’s priced similarly to S3 data transfers (same price at 10 TB or less, and 1 cent less for higher tiers). However, before your content can get served off the edge, it has to get there. So if the edge has to go back to the origin to get the content (i.e., the content is not in cache or has expired from cache), you’ll also incur a normal S3 charge.
You can think of this as being similar to the way that Akamai charged a few years ago — you’d pay a bandwidth fee for delivering content to users, but you’d also pay something called an “administrative bandwidth fee”, which was charged for edge servers fetching from the origin. That essentially penalized you when there was a cache miss. On a big commercial CDN like Akamai, customers reasonably felt this was sort of unfair, competitors like Mirror Image hit back by not charging those fees, and by and large the fee disappeared from the market.
It makes sense for Amazon to charge for having to go back to the (S3) origin, though, because the open-to-all-comers nature of CloudFront means that they would naturally have a ton of customers who have long-tail content and therefore very low cache hit ratios. On a practical basis, however, quite a few people who try out CloudFront will probably discover that the cache miss ratio for their content is too high; since a cache miss also confers no performance benefit, the higher delivery cost won’t make sense, and they’ll go back to just using S3 itself for static delivery. In other words, the pricing scheme will make customers self-regulate, over time.
Performance and CDN Server Misdirection
Some starting fodder for the performance-minded: My preliminary testing shows that Amazon’s accuracy in choosing an appropriate node can be extremely poor. I’ve noted previously that the nameserver is the location reference, and so the CDN’s content-routing task is to choose a content server close to that location (which will also hopefully be close to the end-user).
I live in suburban DC. I tested using cdn.wolfire.com (a publicly-cited CloudFront customer), and checked some other CloudFront customers as well to make sure it wasn’t a domain-specific aberration. My local friend on Verizon FIOS, using a local DC nameserver, gets Amazon’s node in Northern Virginia (suburban DC), certainly the closest node to us. Using the UltraDNS nameserver in the New York City area, I also get the NoVa node — but from the perspective of that nameserver, Amazon’s Newark node should actually be closer. Using the OpenDNS nameserver in NoVa, Amazon tries to send me to its CDN node in Palo Alto (near San Francisco) — not even close. My ISP, MegaPath, has a local nameserver in DC; using that, Amazon sends me to Los Angeles.
That’s a serious set of errors. By ping time, the correct, NoVa node is a mere 5 ms away from me. The LA node is 71 ms from me. Palo Alto is even worse, at 81 ms. Both of those times are considerably worse than pure cross-country latency across a decent backbone. My ping to the test site’s origin (www.wolfire.com) is only 30 ms, making the CDN latency more than twice as high.
However, I have a certain degree of faith: Over time, Amazon has clearly illustrated that they learn from their mistakes, they fix them, and they improve on their products. I assume they’ll figure out proper redirection in time.
What It Means for the Market
Competitively, it seems like Rackspace’s Cloud Files plus Limelight may turn out to be the stronger offering. The price of Rackspace/Limelight is slightly higher, but apparently there’s no origin retrieval charge, and Limelight has a broader footprint and therefore probably better global performance (although there are many things that go into CDN performance beyond footprint). And while anyone can use CloudFront for the teensiest bit of delivery, the practical reality is that they’re not going to — the pricing scheme will make that irrational.
Some people will undoubtedly excitedly hype CloudFront as a game-changer. It’s not. It’s certainly yet another step towards having ubiquitous edge delivery of all popular static content, and the prices are attractive at low to moderate volumes, but high-volume customers can and will get steeper discounts from established players with bigger footprints and a richer feature set. It’s a logical move on Amazon’s part, and a useful product that’s going to find a large audience, but it’s not going to majorly shake up the CDN industry, other than to accelerate the doom of small undifferentiated providers who were already well on their way to rolling into the fiery pit of market irrelevance and insolvency.
(I can’t write a deeper market analysis on my blog. If you’re a Gartner client and want to know more, make an inquiry and I’d be happy to discuss it.)
Investors keep gnawing persistently at the issue of data center overcapacity and the colo market, but it doesn’t look like too many people are thinking about potential overcapacity in the cloud infrastructure market and what that means in the medium-term, economic-downturn scenario.
Many of the emerging cloud infrastructure service providers (“cloud hosters”) are building substantial chunks of capacity in order to accomodate a bunch of Web 2.0 start-ups with unpredictable and sometimes wildly variable capacity needs. (The now-classic example is Animoto.)
If this sounds naggingly familiar, it should. In the ’90s, colo providers built out tons of capacity in anticipation of the growth of the dot-coms. That growth was real enough until VCs stopped pouring money into companies that had no real revenues, causing the whole house of cards to collapse — the dot-coms folded and took many colocation companies with them. (The story is more complex than that, but that’s a succinct enough summary.)
Amazon can presumably afford to have gargantuan capacity reserves in EC2, because Amazon’s got heavily seasonal traffic that leaves it with gobs of spare capacity post-Christmas, meaning that anything it can do to get revenue from that pile of hardware is gravy. As long as EC2 keeps growing, every holiday season it can pile on more hardware and then release it thereafter to be absorbed into the EC2 pool.
Other cloud providers don’t have that luxury. And renting unmanaged hardware at commodity prices has ugly returns on invested capital, and the risk profile is much higher than it is with data center leasing or colo. Most cloud providers are trying not to overbuy, but running out of capacity also has its risks, especially when it looks like the heavens will be raining failed start-ups as they run out of cash.
Much has been made of the theoretically democratizing effect of electronic marketplaces, like the iPhone store. But it’s worth noting that such marketplaces can be, like the iPhone store, gated communities, not free-for-alls where anyone can hawk their wares.
On Friday, Google was set to launch a voice recognition app on the iPhone. Now we’re being told it will probably be available today. The reason for the delay: Apple hasn’t approved the app for the store, for reasons that currently remain mysterious. Google, obviously, wasn’t expecting it, since they had a blizzard of publicity surrounding the Friday launch in what was apparently every anticipation that it’d be available then. Now, Google is well-known for the chaos of its internal organization, but it doesn’t seem unreasonable to believe that their PR people have it pretty well together, which leaves one wondering what Apple was thinking.
There have been regular App Store approval issues, of course, but the Google app is particularly high-profile.
Vendor-controlled open marketplaces are only as open as the vendor wants to make them. That’s true of the marketplaces evolving around cloud services, too. Don’t lose sight of the fact that vendor-controlled ultimately means that the vendor has the power to do anything that the market will tolerate them doing, which at the moment can be quite considerable.