The week’s observations
My colleague Tom Bittman has written a great summary of the hot topics from the Gartner data center conference this past week.
Some personal observations as I wrap up the week…
The future of infrastructure is the cloud. I use “cloud” in a broad sense; many larger organizations will be building their own “private clouds” (which technically aren’t actually clouds, but the “private cloud” terminology has sunk in and probably won’t be easily budged). I was surprised by how many people at the conference wanted to talk to me about initial use of public clouds, how to structure cloud services within their own organizations, and what they could learn from public cloud and hosting services.
Cloud demos are extremely compelling. I was using demos of several clouds in order to make my points to people asking about cloud computing: Terremark’s Enterprise Cloud, Rackspace’s Mosso, and Amazon’s EC2 plus RightScale. I showed some screen shots off 3Tera’s website as well. I did not warn the providers that I was going to do this, and none of them were at the conference (a pity, since I suspect this would have been lead-generating). It was interesting to see how utterly fascinated people were — particularly with the Terremark offering, which is essentially a private cloud. (People were stopping me in the hallways to say, “I hear you have a really cool cloud demo.”) I was showing the trivially easy point-and-click process of provisioning a server, which, I think, provided a kind of grounding for “here is how the cloud could apply to your business”.
Colocation is really, really hot. My one-on-one schedule was crammed with colocation questions, though, as were my conversations with attendees in hallways and over meals, yet I was shocked by how many people showed up to my Friday, 8 am talk on colocation — the best-attended talk of the slot, I was told (and one cursed by lots of A/V glitches). Over the last month, we’ve seen demand accelerate and supply projections tighten — neither businesses nor data center providers can build right now.
A crazy conference week, like always, but tremendously interesting.
Amazon SimpleDB, plus a bit on cloud storage
Amazon SimpleDB is now in public beta. This database-as-a-service has been in private beta for some time, but what’s really noteworthy is that with the public beta, Amazon has dropped the price drastically, and the first 25 machine hours, 1 GB of storage, and 1 GB of transfer are free, meaning that it’s essentially free to experiment with.
On another Amazon-related note, my colleagues who cover storage have recently put out a research note titled, “A Look at Amazon’s S3 Cloud-Computing Storage Service“. If you’re a Gartner client contemplating use of S3, I’d suggest checking it out.
I want to stress something that’s probably not obvious from that note: You can’t mount S3 storage like a normal filesystem. You access it via its APIs, and that’s all. If you use EC2 and you need cloud storage that looks like a regular filesystem, you’ll want to use Amazon’s Elastic Block Store. If you’re using S3, whether within EC2 or from your own infrastructure, you’re either going to make API calls directly (which will make your apps dependent upon S3), or you’re going to have to have to go through a filesystem driver like Fuse (commercially, Subcloud).
Cloud storage, at this stage, is typically reliant upon proprietary APIs. Some providers are starting to offer filesystems, such as Nirvanix‘s CloudNAS (now in beta), but we’re at the very earliest stages of that. I suspect that the implementation hurdles created by API-only access, and not the contractual issues, will be what stop enterprises from adopting it in the near term.
On a final storage-related note, Rackspace (Mosso) Cloud Files remains in a definitively beta stage. I was playing with the shell I was writing (adding an FTP-like get and put with progress bars and such), and trying to figure out why my API calls were failing. It turned out that the service was in read-only mode for a while yesterday, and even read calls (via the API) were failing for a bit (returning 500 Internal Server Error codes). On the plus side, my real-time chat — Rackspace’s support via an instant-messaging-like interface — support request, which I made to report the read outage, was answered immediately, politely, and knowledgeably, one clear way that the Rackspace offering wins over S3. (Amazon charges for support.)
New CDN research notes
I have three new research notes out:
Determine Your Video Delivery Requirements. When I talk to clients, I often find that IT is trying to source a video delivery solution without having much of an idea of what the requirements actually are. This note is directed at them; it’s intended to serve as a framework for discussions with the content owners.
Toolkit: Determining Your Content Delivery Network Requirements. This toolkit consists of three Excel worksheets. The first gathers a handful of high-level requirements, in order to figure out what type of vendor you’re probably looking for. The second helps you estimate your volume and convert between the three typical measurements used (Mbps, MPVs, or GB delivered). The third is a pricing estimator and converter.
Purchasing Content Delivery Network Services. This is a practical guide to buying CDN services, targeted towards mid-sized and enterprise purchasers.
At Gartner’s data center conference
I’m at Gartner’s data center conference this week. My presentation (on best practices for colocation) is at Friday at 8 am, and I’m also participating in the panel for the Data Center Facilities “town hall” at 10 am on Thursday.
The rest of the time, I’ll be available for one-on-ones and the like. If you’re at the conference and want to talk about trends in cloud computing adoption, or anything related to Internet data centers, colocation, hosting, or content delivery networks, please schedule something with me — through the one-on-one process if you can, or directly via email if you’re a vendor with a product demo or the like that you want to show.
An initial Mosso foray
I had an hour to kill today after a client didn’t show for a call… everyone’s taken off for T-day, I guess.
Since Rackspace has a 30-day money-back guarantee on Mosso at the moment, along with a nice Thanksgiving discount making the first month just $20, I decided to sign up for an actual account, on my personal dime. It allows me to offer guilt-free commentary based on real experience, and the freedom to bug customer support for things with the recognition that I am actually giving the company my money, and am therefore entitled to ask whatever questions I want, without giving Analyst Relations a headache. So here’s a little ramble of liveblogging my Mosso experience.
The first hurdle I ran into is that there’s no easy way to take your Cloud Files account and sign up for Cloud Sites (i.e., the main Mosso service). After a bit of a chat with their live online sales, and a few minutes of waiting while the guy checked around (during which I started writing this blog entry). After a while, I was informed I could put in a support ticket and they’d take care of it on the back end. I decided to save them some trouble and just get another account (thus allowing me to do some future playing about with copying things between Cloud Files accounts, in my desire to create parallel cloud utilities to sftp and scp), but it was a bit of an oddity — Sites is a logical upsell from Files, so I presume that functionality is probably coming eventually.
Next, I went to look for a way to change my initial sign-up password. Nothing obvious in the interface, nothing on the knowledge base… I shrugged and provisioned myself a site. On the site’s config, I found the way to change the password — and also discovered, to my horror, that the password shows up in cleartext. That certainly prompted me to change my password immediately.
I did not want to transfer my domain, but the site info page shows what Mosso wants the DNS records to be; I adjusted the DNS records on my end for what I needed, no problem. I also provisioned a second site with a non-www hostname (note that Mosso automatically turns domain.com into http://www.domain.com), which worked fine and intelligently (a recent change, I guess, because when I tried a demo account last week, it insisted on spewing full DNS info, rather than just an A record, for that).
I looked at what was available for PHP, and realized that if I wanted a framework like Zend, I’d have to install it myself, and without SSH access, that looked like it was going to be a festival of non-fun, if not flat-out impossible.
So, I turned on CGI support, which seemed to take two rounds of saving my settings, on both sites I tried it on. But CGI support does not seem to actually work for me — it’s returning 404 errors on my uploaded test scripts. Perhaps this is part of the “you may need to wait two hours” warning message given on the change-config page, but it sure would be nice if it said “pending” if that were the case, or otherwise gave some hint as to what requires a wait and what doesn’t.
I’m going to wait and see, but it’s become clear that I can’t actually do what I want with Mosso, because of the following: If you’re not running a directly supported environment (PHP, Ruby on Rails, ASP.NET), you are stuck with shoving your code, in the form of scripts, into the cgi-bin directory and that’s that. The perl and python support is just generic CGI script support. So there’s no support for mod_python, and therefore you can’t run Django.
The Mosso “Is it a fit?” page implies too much, I think. The page lists “application frameworks”, and should probably more properly say “CGI scripts (Perl, Python)” rather than the implication that the perl and python support is in the form of actual application frameworks, which you’d normally expect to be something like Catalyst for perl, or Django for python.
It’s making me think about the very, very fuzzy definition for what it means to be an application platform as a service (APaaS) vendor.
CDN: software or infrastructure?
Akamai is a software company. It is not, fundamentally, an infrastructure company. You could blow up their network of servers from orbit, and most of the value of the company would be preserved. It’s a company that runs on its intellectual property, the breadth and depth of its feature set, and, ultimately, its ability to rapidly innovate software features. Like any SaaS company, it needs infrastructure upon which to deliver those features (which are mostly focused around content delivery), but the infrastructure itself is not really its value.
I used to be able to generalize this to the CDN market as a whole. Like most software markets, there was feature-set competition and the generality that what you invented today, your competitor would try to replicate in the next year.
I’ve come to the realization that this is no longer an accurate characterization. It’s true of Akamai and a very small number of competitors, but for everyone else, CDN has become an infrastructure play — about having a network and a lot of servers and just enough software to control it all efficiently.
Software and infrastructure businesses have very different characteristics, and the shift has a cascade of implications.
Mosso’s Cloud Files API
I had some time to kill on a train today, and I amused myself by trying out the API for Cloud Files.
I decided I’d build something mildly useful: a little shell-like wrapper and accompanying tools for dealing with Cloud Files from the command line. So I cranked out a quick “ls”-like utility and a little interactive shell chrome around it, with the idea that my little “cfls” script would be handy for listing Cloud Files embedded in another shell script (like a quick script typed in on the command line), and that I’d be able to do more interesting manipulations from within the interactive “shell”. I decided to do it in Python, since of the languages that the API was available in, that’s the one I’m most comfortable with; I’m not a great Python programmer but I can manage. (It’s been more than a decade since I’ve written code for pay, and it probably shows.)
I was reasonably pleased by the fruits of my efforts; the API was pleasantly effortless to work with for these kind of minor tasks, and I have a version 0.1. I got the shell core and the cfls utility working, and for the heck of it, I made direct API calls out of the Python interactive mode to upload my source code to Cloud Files. For nearly no time investment, this seems pretty satisfying.
The only annoying quirk that I discovered was that the containers in the results set of get_all_containers() do not get their instance variables fully populated — all that’s populated is the name. (Calling it results in constructing a ContainerResults object with list_containers(), and the iterator only populates each Container generated with its name.) So it seems like you have to call list_containers() to get all the container names, and then get_container() on each container name, if you actually need the instance variables. I also had some odd unexpected exceptions thrown when testing things out in the Python shell — related to time-outs on the connection object, I assume. Still, these are not problems that cause more than a moment’s sigh.
The Cloud Files Python library is far and away better than the Amazon S3 Python library, which seems much more like demonstration code than a real working library (which is probably why people doing things with AWS and Python tend to use Boto instead). The Cloud Files module for Python is decently if sparsely documented, but its usage is entirely self-evident. It’s simply and intelligently structured, in a logical object hierarchy.
The important point: It’s trivial for any idiot to build apps using Cloud Files.
Basic CDNs: where are my logs?
I’m now trialing three different basic CDNs, having each of them deliver exactly one image off the same front page of a website: Amazon’s CloudFront, Rackspace’s Cloud Files CDN (a Limelight Networks partnership), and SimpleCDN.
SimpleCDN has turned out to be the overwhelming winner in terms of initial set-up — you can simply point it at a site and it will mirror your content, just like you’ll find in more full-featured CDNs. The downside is that it has basically no additional controls — from what I can tell, you can’t even delete a mirror once you set it up, and there’s no way to explicitly mark content with a Cache-Control header or the like. It also wins on the sheer price of data transfer — just a hair under 6 cents per GB. The downside is that, at that price, it doesn’t come close come to touching Limelight’s footprint, or, for that matter, Amazon’s — there are no East Coast delivery locations, for instance. It’s also routed via AnyCast.
Cloud Files and SimpleCDN are both missing meaningful logging capabilities. Cloud Files can give me daily aggregate totals of disk space used, bandwidth in and out, and number of paid and free operations. (Just numbers. No graph.) SimpleCDN can give me aggregated traffic graphs (outbound bandwidth, hits per second, and hits per 15 minutes) for the past 2 days and the past 2 weeks, plus aggregate totals for the last 30 minutes, day, and week.
With Amazon, you get detailed usage reports, modulo the fact that they are indecipherably opaque. You can set up S3 to give you detailed server logs; I’m processing mine through S3stat, which is a service that will Webalizer your S3 logs for you. Amazon is promising such logs for CloudFront in the future. At the moment, I’m stuck with the enigmatic usage reports. Nothing I can find anywhere will tell me what the difference between a tier 1 and tier 2 request is, for instance. What I’m interested in finding out is what percentage of my requests end up falling back to the origin, but it looks like that is a mystery that will have to wait for CloudFront logging.
Lack of logging is not particularly a big deal if you are just trying to offload static content in the form of images and the like — presumably in that scenario you have decent analytics based off hits to the base page or a tracking counter or something like that. However, if you’re trying to track something like software downloads, it is certainly a much more significant problem. And importantly, logs let you verify exactly what is going on, which may be significant for troubleshooting as well as settling billing disputes.
The .tel sunrise, and some irony
Telnic would like to become the Internet’s white pages, via the DNS as well as a Web-based presence; the “sunrise” (first registrations) on their .tel domain begins on December 3rd.
In the meantime, they’re taking trial sign-ups. I signed up. Telnic promptly emailed me a registration confirmation, sending my username and password to me in cleartext.
This company would like me to believe that they can be trusted to keep my confidential personal (or business) information secure…
Rackspace Cloud Files (Mosso) + Limelight CDN
It’s official: Rackspace’s Cloud Files can now be distributed via Limelight’s CDN.
The Cloud Files CDN works exactly like announced: drop your files into Cloud Files, push a button, have them served over the Limelight CDN. It’s 22 cents flat-rate, without origin fees. I’ve discussed this deal before, including in the context of Amazon’s CloudFront, and my previous commentary stands.
What will be interesting to see is the degree to which Limelight preserves infrequently-accessed content in its edge caches, vs. Amazon, as that will make a significant impact on performance, as well as cost-effectiveness vs. CloudFront. Limelight has a superior footprint, superior peering relationships, and much more sophisticated content routing. For people who are just looking to add a little bit of effortless CDN, their ability to keep infrequently-accessed content fresh on their edge caches will determine just how much of a performance advantage they have — or don’t have — versus CloudFront.
A Little Testing
In the spirit of empirical testing, I’ve created a Cloud Files account, too, and am now delivering the images of the front page of a tiny website from a combination of CloudFront and Cloud Files (which is under Rackspace’s Mosso brand).
It looks like my Cloud Files are being served from the same Limelight footprint as other Limelight customers, at least to the extent that I get the exact same edge resolutions for http://www.dallascowboys.com as I do for cdn.cloudfiles.mosso.com.
The Cloud Files deployment is reasonably elegant, and Mosso provides a GUI and thus is more directly user-friendly. You create a container (folder), drop your files into it, and click a button to make the files public, making them automatically and instantly available via Limelight. (The Amazon process is slower, although still within minutes.)
However, in a move that’s distinctly inferior to the CloudFront implementation, the base URL for the files is http://cdn.cloudfiles.mosso.com/some-hash-value/yourfile, vs. the more elegant CloudFront http://some-hash-value.cloudfront.net/yourfile. You can CNAME the latter to some nicely in-your-domain host (images.yourcompany.com, let’s say), and in fact, if you’ve used another CDN or have simply been smart and delivered static content off its own hostname, you’d already have that set up. The Mosso method means that you’ve got to go through all your content and munge your URLs to point to the CDN.
Aside from being annoying, it makes the elegant fallback, less elegant. Having the CNAME means that if some reason CloudFront goes belly-up, I can trivially repoint the CNAME to my own server (and I can easily automate this task, monitoring CloudFront deliverability and swapping the DNS if it dies). I can’t do that as easily with Mosso (although I can CNAME cdn.cloudfiles.mosso.com to something in my domain, and set up a directory structure that uses the hash, or use a URL rewriting rule, and still get the same effect, so it is merely more awkward, not impossible).
That might be in part a way for Limelight to differentiate this from its normal CDN offering, which has the modern and elegant deployment mechanism of: simply point your website hostname at Limelight and Limelight will do the rest. In that scheme, Limelight automatically figures out what to cache, and does the right thing. That seems unlikely, though. I hope the current scheme is just temporary until they get a CloudFront-like DNS-based implementation done.
I’ll have more to say once I’ve accumulated some log data, and add SimpleCDN into the mix.