Category Archives: Infrastructure
Ranking of ISPs
Datahounds might be interested in the Renesys ISP rankings for 2008.
Renesys is a company that specializes in collecting data about the Internet, focused upon the peering ecosystem. Its rankings are essentially a matter of size — how much IP address space ends up transiting each provider?
Among the interesting data points: Level 3 has overtaken Sprint for the #1 spot, Global Crossing has continued its rapid climb to become #3, Telia Sonera has grown steadily, and, broadly, Asia is a huge source for growth.
Anti-virus integration with cloud storage
Anti-virus vendor Authentium is now offering its AV-scanning SDK to cloud providers.
Authentium, unlike most other AV vendors, has traditionally been focused at the gateway; they offer an SDK designed to be embedded in applications and appliances. (Notably, Authentium is the scanning engine used by Google’s Postini service.) So courting cloud providers is logical for them.
Anti-virus integration makes particular sense for cloud storage providers. Users of cloud storage upload millions of files a day. Many businesses that use cloud storage do so for user-generated content. AV-scanning a file as part of an upload could be just another API call — one that could be charged for on a per-operation basis, just like GET, PUT, and other cloud storage operations. That would turn AV scanning into a cloud Web service, making it trivially easy for developers to integrate AV scanning into their applications. It’d be a genuine value-add for using cloud storage — a reason to do so beyond “it’s cheap”.
More broadly, security vendors have become interested in offering scanning as a service, although most have desktop installed bases to defend, and thus are looking at it as a supplement as opposed to a replacement for traditional desktop AV products; see the past news on McAfee’s Project Artemis or Trend Micro’s Smart Protection Network for examples.
Velocix Metro
CDN provider Velocix has announced the launch of a new product, called Velocix Metro. (I was first briefed on Metro almost eight months ago, so the official launch has been in the works for quite a while.)
Velocix Metro is essentially a turnkey managed CDN service, deployed at locations of an Internet service provider’s choice, and potentially embedded deep into that ISP’s network. The ISP gets a revenue share based on the traffic delivered via their networks from Velocix, plus the ability to do their own delivery from the deployed CDN nodes in their network. Velocix’s flagship customer for this service is Verizon.
You might recall that Velocix is a partner CDN to MediaMelon, which I discussed in the context of CDN overlays a few weeks ago. I believe that these kinds of federated networks are going to become increasingly common, because carriers are the natural choice to provide commoditized CDN services (due to their low network costs), and broadband service providers need some way to monetize the gargantuan and growing volumes of rich content being delivered to their end-user eyeballs.
The economics of the peering ecosystem make it very hard for broadband providers to raise the price of bandwidth bought by content providers, and intermodal competition (i.e., DSL/FiOS vs. cable) creates pricing competition that makes it hard to charge end-users more. So broadband providers need to find another out, and offering up their own CDNs, and thus faster access to their eyeballs, is certainly a reasonable approach. (That means that over the long term, providers that deploy their own CDNs are probably going to be less friendly about placing gear from other CDNs deep within their networks, especially if it’s for free.)
We are entering the period of the rise of the local CDN — CDNs with deep but strictly regional penetration. For instance, if you’re a broadcaster in Italy, with Italian-language programming, you probably aren’t trying to deliver to the world and you don’t want to pay the prices necessary to do so; you want deep coverage within Italy and other Italian-speaking countries, and that’s it. An overlay or federated approach makes it possible to tie together CDNs owned by incumbent regional carriers, giving you delivery in just the places you care about. And that, in turn, creates a compelling business case for every large network provider to have a CDN of their own. Velocix, along with other vendors who can provide turnkey solutions to providers who want to build their own CDN networks, ought to benefit from that shift.
IronScale launches
Sacramento-based colocation provider RagingWire has launched a subsidiary, StrataScale, whose first product is a managed cloud hosting service, IronScale. (I’ve mentioned this before, but the launch is now official.) I’ll be posting more on it once I’ve had time to check out a demo, but here’s a quick take:
What’s interesting is that IronScale is not a virtualized service. The current offering is on dedicated hardware — similar to the approach taken by SoftLayer, but this is a managed service. But it has the key cloud trait of elasticity — the ability to scale up and down at will, without commitments.
IronScale has automated fast provisioning (IronScale claims 3 minutes for the whole environment), management through the OS layer (including services like patch management), an integrated environment that includes the usual network suspects (firewall, load balancing, SSL acceleration), and a 100% uptime SLA. You can buy service on a month-to-month basis or an annual contract. This is a virtual data center offering; there’s a Web portal for provisioning plus a Web services API, along with some useful tricks like cloning and snapshots.
It’s worth noting that cloud infrastructure services, in their present incarnation, are basically just an expansion of the hosting market — moving the bar considerably in terms of expected infrastructure flexibility. This is real-time infrastructure, virtualized or not. It’s essentially a challenge to other companies who offer basic managed services — Rackspace, ThePlanet, and so on — but you can also expect it to compete with the VDC hosting offerings that target the mid-sized to enteprise organizations.
Amazon SimpleDB, plus a bit on cloud storage
Amazon SimpleDB is now in public beta. This database-as-a-service has been in private beta for some time, but what’s really noteworthy is that with the public beta, Amazon has dropped the price drastically, and the first 25 machine hours, 1 GB of storage, and 1 GB of transfer are free, meaning that it’s essentially free to experiment with.
On another Amazon-related note, my colleagues who cover storage have recently put out a research note titled, “A Look at Amazon’s S3 Cloud-Computing Storage Service“. If you’re a Gartner client contemplating use of S3, I’d suggest checking it out.
I want to stress something that’s probably not obvious from that note: You can’t mount S3 storage like a normal filesystem. You access it via its APIs, and that’s all. If you use EC2 and you need cloud storage that looks like a regular filesystem, you’ll want to use Amazon’s Elastic Block Store. If you’re using S3, whether within EC2 or from your own infrastructure, you’re either going to make API calls directly (which will make your apps dependent upon S3), or you’re going to have to have to go through a filesystem driver like Fuse (commercially, Subcloud).
Cloud storage, at this stage, is typically reliant upon proprietary APIs. Some providers are starting to offer filesystems, such as Nirvanix‘s CloudNAS (now in beta), but we’re at the very earliest stages of that. I suspect that the implementation hurdles created by API-only access, and not the contractual issues, will be what stop enterprises from adopting it in the near term.
On a final storage-related note, Rackspace (Mosso) Cloud Files remains in a definitively beta stage. I was playing with the shell I was writing (adding an FTP-like get and put with progress bars and such), and trying to figure out why my API calls were failing. It turned out that the service was in read-only mode for a while yesterday, and even read calls (via the API) were failing for a bit (returning 500 Internal Server Error codes). On the plus side, my real-time chat — Rackspace’s support via an instant-messaging-like interface — support request, which I made to report the read outage, was answered immediately, politely, and knowledgeably, one clear way that the Rackspace offering wins over S3. (Amazon charges for support.)
An initial Mosso foray
I had an hour to kill today after a client didn’t show for a call… everyone’s taken off for T-day, I guess.
Since Rackspace has a 30-day money-back guarantee on Mosso at the moment, along with a nice Thanksgiving discount making the first month just $20, I decided to sign up for an actual account, on my personal dime. It allows me to offer guilt-free commentary based on real experience, and the freedom to bug customer support for things with the recognition that I am actually giving the company my money, and am therefore entitled to ask whatever questions I want, without giving Analyst Relations a headache. So here’s a little ramble of liveblogging my Mosso experience.
The first hurdle I ran into is that there’s no easy way to take your Cloud Files account and sign up for Cloud Sites (i.e., the main Mosso service). After a bit of a chat with their live online sales, and a few minutes of waiting while the guy checked around (during which I started writing this blog entry). After a while, I was informed I could put in a support ticket and they’d take care of it on the back end. I decided to save them some trouble and just get another account (thus allowing me to do some future playing about with copying things between Cloud Files accounts, in my desire to create parallel cloud utilities to sftp and scp), but it was a bit of an oddity — Sites is a logical upsell from Files, so I presume that functionality is probably coming eventually.
Next, I went to look for a way to change my initial sign-up password. Nothing obvious in the interface, nothing on the knowledge base… I shrugged and provisioned myself a site. On the site’s config, I found the way to change the password — and also discovered, to my horror, that the password shows up in cleartext. That certainly prompted me to change my password immediately.
I did not want to transfer my domain, but the site info page shows what Mosso wants the DNS records to be; I adjusted the DNS records on my end for what I needed, no problem. I also provisioned a second site with a non-www hostname (note that Mosso automatically turns domain.com into http://www.domain.com), which worked fine and intelligently (a recent change, I guess, because when I tried a demo account last week, it insisted on spewing full DNS info, rather than just an A record, for that).
I looked at what was available for PHP, and realized that if I wanted a framework like Zend, I’d have to install it myself, and without SSH access, that looked like it was going to be a festival of non-fun, if not flat-out impossible.
So, I turned on CGI support, which seemed to take two rounds of saving my settings, on both sites I tried it on. But CGI support does not seem to actually work for me — it’s returning 404 errors on my uploaded test scripts. Perhaps this is part of the “you may need to wait two hours” warning message given on the change-config page, but it sure would be nice if it said “pending” if that were the case, or otherwise gave some hint as to what requires a wait and what doesn’t.
I’m going to wait and see, but it’s become clear that I can’t actually do what I want with Mosso, because of the following: If you’re not running a directly supported environment (PHP, Ruby on Rails, ASP.NET), you are stuck with shoving your code, in the form of scripts, into the cgi-bin directory and that’s that. The perl and python support is just generic CGI script support. So there’s no support for mod_python, and therefore you can’t run Django.
The Mosso “Is it a fit?” page implies too much, I think. The page lists “application frameworks”, and should probably more properly say “CGI scripts (Perl, Python)” rather than the implication that the perl and python support is in the form of actual application frameworks, which you’d normally expect to be something like Catalyst for perl, or Django for python.
It’s making me think about the very, very fuzzy definition for what it means to be an application platform as a service (APaaS) vendor.
CDN: software or infrastructure?
Akamai is a software company. It is not, fundamentally, an infrastructure company. You could blow up their network of servers from orbit, and most of the value of the company would be preserved. It’s a company that runs on its intellectual property, the breadth and depth of its feature set, and, ultimately, its ability to rapidly innovate software features. Like any SaaS company, it needs infrastructure upon which to deliver those features (which are mostly focused around content delivery), but the infrastructure itself is not really its value.
I used to be able to generalize this to the CDN market as a whole. Like most software markets, there was feature-set competition and the generality that what you invented today, your competitor would try to replicate in the next year.
I’ve come to the realization that this is no longer an accurate characterization. It’s true of Akamai and a very small number of competitors, but for everyone else, CDN has become an infrastructure play — about having a network and a lot of servers and just enough software to control it all efficiently.
Software and infrastructure businesses have very different characteristics, and the shift has a cascade of implications.
Mosso’s Cloud Files API
I had some time to kill on a train today, and I amused myself by trying out the API for Cloud Files.
I decided I’d build something mildly useful: a little shell-like wrapper and accompanying tools for dealing with Cloud Files from the command line. So I cranked out a quick “ls”-like utility and a little interactive shell chrome around it, with the idea that my little “cfls” script would be handy for listing Cloud Files embedded in another shell script (like a quick script typed in on the command line), and that I’d be able to do more interesting manipulations from within the interactive “shell”. I decided to do it in Python, since of the languages that the API was available in, that’s the one I’m most comfortable with; I’m not a great Python programmer but I can manage. (It’s been more than a decade since I’ve written code for pay, and it probably shows.)
I was reasonably pleased by the fruits of my efforts; the API was pleasantly effortless to work with for these kind of minor tasks, and I have a version 0.1. I got the shell core and the cfls utility working, and for the heck of it, I made direct API calls out of the Python interactive mode to upload my source code to Cloud Files. For nearly no time investment, this seems pretty satisfying.
The only annoying quirk that I discovered was that the containers in the results set of get_all_containers() do not get their instance variables fully populated — all that’s populated is the name. (Calling it results in constructing a ContainerResults object with list_containers(), and the iterator only populates each Container generated with its name.) So it seems like you have to call list_containers() to get all the container names, and then get_container() on each container name, if you actually need the instance variables. I also had some odd unexpected exceptions thrown when testing things out in the Python shell — related to time-outs on the connection object, I assume. Still, these are not problems that cause more than a moment’s sigh.
The Cloud Files Python library is far and away better than the Amazon S3 Python library, which seems much more like demonstration code than a real working library (which is probably why people doing things with AWS and Python tend to use Boto instead). The Cloud Files module for Python is decently if sparsely documented, but its usage is entirely self-evident. It’s simply and intelligently structured, in a logical object hierarchy.
The important point: It’s trivial for any idiot to build apps using Cloud Files.
Basic CDNs: where are my logs?
I’m now trialing three different basic CDNs, having each of them deliver exactly one image off the same front page of a website: Amazon’s CloudFront, Rackspace’s Cloud Files CDN (a Limelight Networks partnership), and SimpleCDN.
SimpleCDN has turned out to be the overwhelming winner in terms of initial set-up — you can simply point it at a site and it will mirror your content, just like you’ll find in more full-featured CDNs. The downside is that it has basically no additional controls — from what I can tell, you can’t even delete a mirror once you set it up, and there’s no way to explicitly mark content with a Cache-Control header or the like. It also wins on the sheer price of data transfer — just a hair under 6 cents per GB. The downside is that, at that price, it doesn’t come close come to touching Limelight’s footprint, or, for that matter, Amazon’s — there are no East Coast delivery locations, for instance. It’s also routed via AnyCast.
Cloud Files and SimpleCDN are both missing meaningful logging capabilities. Cloud Files can give me daily aggregate totals of disk space used, bandwidth in and out, and number of paid and free operations. (Just numbers. No graph.) SimpleCDN can give me aggregated traffic graphs (outbound bandwidth, hits per second, and hits per 15 minutes) for the past 2 days and the past 2 weeks, plus aggregate totals for the last 30 minutes, day, and week.
With Amazon, you get detailed usage reports, modulo the fact that they are indecipherably opaque. You can set up S3 to give you detailed server logs; I’m processing mine through S3stat, which is a service that will Webalizer your S3 logs for you. Amazon is promising such logs for CloudFront in the future. At the moment, I’m stuck with the enigmatic usage reports. Nothing I can find anywhere will tell me what the difference between a tier 1 and tier 2 request is, for instance. What I’m interested in finding out is what percentage of my requests end up falling back to the origin, but it looks like that is a mystery that will have to wait for CloudFront logging.
Lack of logging is not particularly a big deal if you are just trying to offload static content in the form of images and the like — presumably in that scenario you have decent analytics based off hits to the base page or a tracking counter or something like that. However, if you’re trying to track something like software downloads, it is certainly a much more significant problem. And importantly, logs let you verify exactly what is going on, which may be significant for troubleshooting as well as settling billing disputes.
Rackspace Cloud Files (Mosso) + Limelight CDN
It’s official: Rackspace’s Cloud Files can now be distributed via Limelight’s CDN.
The Cloud Files CDN works exactly like announced: drop your files into Cloud Files, push a button, have them served over the Limelight CDN. It’s 22 cents flat-rate, without origin fees. I’ve discussed this deal before, including in the context of Amazon’s CloudFront, and my previous commentary stands.
What will be interesting to see is the degree to which Limelight preserves infrequently-accessed content in its edge caches, vs. Amazon, as that will make a significant impact on performance, as well as cost-effectiveness vs. CloudFront. Limelight has a superior footprint, superior peering relationships, and much more sophisticated content routing. For people who are just looking to add a little bit of effortless CDN, their ability to keep infrequently-accessed content fresh on their edge caches will determine just how much of a performance advantage they have — or don’t have — versus CloudFront.
A Little Testing
In the spirit of empirical testing, I’ve created a Cloud Files account, too, and am now delivering the images of the front page of a tiny website from a combination of CloudFront and Cloud Files (which is under Rackspace’s Mosso brand).
It looks like my Cloud Files are being served from the same Limelight footprint as other Limelight customers, at least to the extent that I get the exact same edge resolutions for http://www.dallascowboys.com as I do for cdn.cloudfiles.mosso.com.
The Cloud Files deployment is reasonably elegant, and Mosso provides a GUI and thus is more directly user-friendly. You create a container (folder), drop your files into it, and click a button to make the files public, making them automatically and instantly available via Limelight. (The Amazon process is slower, although still within minutes.)
However, in a move that’s distinctly inferior to the CloudFront implementation, the base URL for the files is http://cdn.cloudfiles.mosso.com/some-hash-value/yourfile, vs. the more elegant CloudFront http://some-hash-value.cloudfront.net/yourfile. You can CNAME the latter to some nicely in-your-domain host (images.yourcompany.com, let’s say), and in fact, if you’ve used another CDN or have simply been smart and delivered static content off its own hostname, you’d already have that set up. The Mosso method means that you’ve got to go through all your content and munge your URLs to point to the CDN.
Aside from being annoying, it makes the elegant fallback, less elegant. Having the CNAME means that if some reason CloudFront goes belly-up, I can trivially repoint the CNAME to my own server (and I can easily automate this task, monitoring CloudFront deliverability and swapping the DNS if it dies). I can’t do that as easily with Mosso (although I can CNAME cdn.cloudfiles.mosso.com to something in my domain, and set up a directory structure that uses the hash, or use a URL rewriting rule, and still get the same effect, so it is merely more awkward, not impossible).
That might be in part a way for Limelight to differentiate this from its normal CDN offering, which has the modern and elegant deployment mechanism of: simply point your website hostname at Limelight and Limelight will do the rest. In that scheme, Limelight automatically figures out what to cache, and does the right thing. That seems unlikely, though. I hope the current scheme is just temporary until they get a CloudFront-like DNS-based implementation done.
I’ll have more to say once I’ve accumulated some log data, and add SimpleCDN into the mix.