Blog Archives

Amazon’s Kindle

I am a big fan of Amazon’s Kindle. My husband gave me one as a gift when they first came out, and I went from taking it on the road to read the occasional thing as supplement to the paperbacks and magazines I was going through, to making it my sole form of reading material while out on travel, to wanting to read just about everything on it, period. The ability to change the font size, essentially allowing me to read every book as if it were in large print, is a big reason why — it’s easier to read bigger print when you’re in something that’s moving, as it creates less eyestrain.

I consume an enormous number of books (around a book a day if I’m traveling, and around half that if I’m not). Books are one of the most significant expenses in my household; my husband and I are both voracious consumers of fiction and non-fiction, and we mostly read different books. Kindle helps me spend a lot less on books, sort of — I pay less for the individual books, but because of the convenience, I also read even more than I normally would. And whereas I often used to wait for the paperback, now I buy books as soon as they come out in Kindle form. Plus, while business books are often grotesquely expensive for relatively limited value, especially when they’re in hardback, at Kindle prices, I don’t mind buying a book for the one cool idea in it, instead of standing around in the bookstore, flipping pages. Finally, rather than buying a ton of books that accumulate in piles and sometimes eventually disappear onto the shelves before I actually read them, I read every item I download onto my Kindle.

New York Times reviewer David Pogue understands the Kindle. But opinion columnist Roy Blount totally fails to get it, using the NYT megaphone to whine that the text-to-speech function potentially steals money from authors who would otherwise be able to sell audiobooks.

Seth Godin loves his Kindle. And he has a bunch of great suggestions for taking the Kindle service to the next level. Among other things, he points out that authors need to embrace these new models as a source for lots of new forms of revenue generation, rather than obtusely trying to cling to the way things are.

You can fear the future, or you can think different and embrace it. Devices like the Kindle open up a wealth of opportunities to authors who are willing to seize them.

Bookmark and Share

Cloud failures

A few days ago, an unexpected side-effect of some new code caused a major Gmail outage. Last year, a small bug triggered a series of cascading failures that resulted in a major Amazon outage. These are not the first cloud failures, nor will they be the last.

Cloud failures are as complex as the underlying software that powers them. No longer do you have isolated systems; you have complex, interwoven ecosystems, delicately orchestrated by a swarm of software programs. In presenting simplicity to the user, the cloud provider takes on the burden of dealing with that complexity themselves.

People sometimes say that these clouds aren’t built to enterprise standards. In one sense, they aren’t — most aren’t intended to meet enterprise requirements in terms of feature-set. In another sense, though, they are engineered to far exceed anything that the enterprise would ever think of attempting themselves. Massive-scale clouds are designed to never, ever, fail in a user-visible way. The fact that they do fail nonetheless should not be a surprise, given the potential for human error encoded in software. It is, in fact, surprising that they don’t visibly fail more often. Every day, within these clouds, a whole host of small errors that would be outages if they occurred within the enterprise — server hardware failures, storage failures, network failures, even some software failures — are handled invisibly by the back-end. Most of the time, the self-healing works the way it’s supposed to. Sometimes it doesn’t. The irony in both the Gmail outage and the S3 outage is that both appear to have been caused by the very software components that were actively trying to create resiliency.

To run infrastructure on a massive scale, you are utterly dependent upon automation. Automation, in turn, depends on software, and no matter how intensively you QA your software, you will have bugs. It is extremely hard to test complex multi-factor failures. There is nothing that indicates that either Google or Amazon are careless about their software development processes or their safeguards against failure. They undoubtedly hate failure as much as, and possibly more than, their customers do. Every failure means sleepless nights, painful internal post-mortems, lost revenue, angry partners, and embarrassing press. I believe that these companies do, in fact, diligently seek to seamlessly handle every error condition they can, and that they generally possess sufficient quantity and quality of engineering talent to do it well.

But the nature of the cloud — the one homogenous fabric — magnifies problems. Still, that’s not isolated to the cloud alone. Let’s not forget VMware’s license bug from last year. People who normally booted up their VMs at the beginning of the day were pretty much screwed. It took VMware the better part of a day to produce a patch — and their original announced timeframe was 36 hours. I’m not picking on VMware — certainly you could find yourself with a similar problem with any kind of widely deployed software that was vulnerable to a bug that caused it all to fail.

Enterprise-quality software produced the SQL Slammer worm, after all. In the cloud, we ain’t seen nothing yet…

Bookmark and Share

Origin story, Amazon EC2

Benjamin Black’s blog has an interesting story: his perspective of how Amazon EC2 came to be.

An interview with Chris Pinkham and Amazon’s Cape Town Development Centre site are interesting reads, too. The latter is focused upon EC2 development.

Bookmark and Share

Volume pricing for Amazon’s CloudFront

New volume pricing for Amazon’s CloudFront CDN takes effect today, February 1st. For US and Europe “edge” delivery, the price goes as low as $0.05/GB at the 1000+ TB level. For Hong Kong, it’s $0.09/GB at that level. For Japan, $0.095/GB. The pricing isn’t quite comparable to a traditional CDN because of the origin bandwidth fees and the per-request fee, but it’s still a useful benchmark.

For those who are mentally comparing this to the cost of bandwidth, those per-GB costs translate into $16/Mbps for US/Europe, and $29/Mbps for Asia. In a day and age when Cogent is splashing “Home of the $4 Megabit” across its home page, it might look like there’s still quite a bit of delta between bandwidth pricing and CDN pricing, but especially once you get out of the US, bandwidth costs escalate pretty dramatically beyond Cogent’s low-water-mark.

Nonetheless, Amazon’s volume pricing play ought to put to an end anyone’s hope that the elimination of some of the financially weaker CDN players is going to do anything significant to alleviate pricing pressure where it’s most severe — the entirely commoditized portion of the market. In fact, this explicit, transparent pricing is probably going to provide a nice bargaining chip. Even if a major media conglomerate isn’t going to use Amazon to deliver their video, it won’t stop its purchasing people from using these published prices to hammer CDNs during negotiations.

Bookmark and Share

COBOL comes to the cloud

In this year of super-tight IT budgets and focus on stretching what you’ve got rather than replacing it with something new, Micro Focus is bringing COBOL to the cloud.

Most vendor “support for EC2” announcements are nothing more than hype. Amazon’s EC2 is a Xen-virtualized environment. It supports the operating systems that run in that environment; most customers use Linux. Applications run no differently there than they do in your own internal data center. There’s no magical conveyance of cloud traits. Same old app, externally hosted in an environment with some restrictions.

But Micro Focus (which is focused around COBOL-based products) is actually launching its own cloud service, built on top of partner clouds — EC2, as well as Microsoft’s Azure (previously announced).

Micro Focus has also said it has tweaked its runtime for cloud deployment. They give the example of storing VSAM files as blobs in SQL. This is undoubtedly due to Azure not offering direct access to the filesystem. (For EC2, you can get persistent normal file storage with EBS, but there are restrictions.) I assume that similar tweaks were made wherever the runtime needs to do direct file I/O. Note that this still doesn’t magically convey cloud traits, though.

It’s interesting to see that Micro Focus has built its own management console around EC2, providing easy deployment of apps based on their technology, and is apparently making a commitment to providing this kind of hosted environment. Amidst all of the burgeoning interest in next-generation technologies, it’s useful to remember that most enterprises have a heavy burden of legacy technologies.

(Disclaimer: My husband was founder and CTO of LegacyJ, a Micro Focus competitor, whose products allow COBOL, including CICS apps, to be deployed within standard J2EE environments — which would include clouds. He doesn’t work there any longer, but I figured I should note the personal interest.)

Bookmark and Share

Touring Amazon’s management console

The newly-released beta of Amazon’s management console is reasonably friendly, but it is not going to let your grandma run her own data center.

I took a bit of a tour today. I’m running Firefox 3 on a Windows laptop, but everything else I’m doing out of a Unix shell — I have Linux and MacOS X servers at home. I already had AWS stuff set up prior to trying this out; I’ve previously used RightScale to get a Web interface to AWS.

The EC2 dashboard starts with a big friendly “Launch instances” button. Click it, and it takes you to a three-tab window for picking an AMI (your server image). There’s a tab for Amazon’s images, one for your own, and one for the community’s (which includes a search function). After playing around with the search a bit (and wishing that every community image came with an actual blurb of what it is), and not finding a Django image that I wanted to use, I decided to install Amazon’s Ruby on Rails stack.

On further experience, the “Select” buttons on this set of tabs seem to have weird issues; sometimes you’ll go to them and they’ll be grayed out and unclickable, sometimes you’ll click them and they’ll go gray but you won’t get the little “Loading, please wait” box that appears before going onto the next tab — and it will leave you stuck, leaving you to cancel the window and try again.

Once you select an image, you’re prompted to select how many instances you want to launch, your instance type, key pair (necessary to SSH into your server), and a security group (firewall config). More twiddly bits, like the availability zone, are hidden in advanced options. Pick your options, click “Launch”, and you’re good to go.

From the launch window, your options for the firewall default to having a handful of relevant ports (like SSH, webserver, MySQL) open to the world. You can’t get more granular with the rules than this there; you’ve got to use the Security Group config panel to add a custom rule. I wish that the defaults would be slightly stricter, like limiting the MySQL port to Amazon’s back-end.

Next, I went to create an EBS volume for user data. This, too, is simple, although initially I did something stupid, failing to notice that my instance had launched in us-east-1b. (Your EBS volume must reside in the same availability zone as your instance, in order for the instance to mount it.)

That’s when I found the next interface quirk — the second time I went to create an EBS volume, the interface continued to insist for fifteen minutes that it was still creating the volume. Normally there’s a very nice Ajax bit that automatically updates the interface when it’s done, but this time, even clicking around the whole management console and trying to come back wouldn’t get it to update the status and thus allow me to attach it to my instance. I had to close out the Firefox tab, and relaunch the console.

Then, I remember that the default key pair that I’d created had been done via RightScale, and I couldn’t remember where I’d stashed the PEM credentials. So that led me to a round of creating a new key pair via the management console (very easy), and having to terminate and launch a new instance using the new key pair (subject to the previously-mentioned interface quirks).

The same interface-somehow-gets-into-indeterminate-state also seems to be a problem for other things, like the console “Output” button for interfaces — you get a blank screen rather than the console dump.

That all dealt with, I log into my server via SSH, don’t see the EBS volume mounted, and remember that I need to actually make a filesystem and explicitly mount it. All creating an EBS volume does is allocate you an abstraction on Amazon’s SAN, essentially. This leads me to trying to find documentation for EBS, which leads to the reminder that access to docs on AWS is terrible. The search function on the site doesn’t index articles, and there are far too many articles to just click through the list looking for what you want. A Google search is really the only reasonable way to find things.

All that aside, once I do that, I have an entirely functional server. I terminate the instance, check out my account, see that this little experiment has cost me 33 cents, and feel reasonably satisfied with the world.

Bookmark and Share

News round-up

A handful of quick news-ish takes:

Amazon has released the beta of its EC2 management console. This brings point-and-click friendliness to Amazon’s cloud infrastructure service. A quick glance through the interface makes it clear that effort was made to make it easy to use, beginning with big colorful buttons. My expectation is that a lot of the users who might otherwise have gone to RightScale et.al. to get the easy-to-use GUI will now just stick with Amazon’s own console. Most of those users would have just been using that free service, but there’s probably a percentage that would otherwise have been upsold who will stick with what Amazon has.

Verizon is courting CDN customers with the “Partner Port Program”. It sounds like this is a “buy transit from us over a direct peer” service — essentially becoming explicit about settlement-based content peering with content owners and CDNs. I imagine Verizon is seeing plenty of content dumped onto its network by low-cost transit providers like Level 3 and Cogent; by publicly offering lower prices and encouraging content providers to seek paid peering with it, it can grab some revenue and improve performance for its broadband users.

Scott Cleland blogged about the “open Internet” panel at CES. To sum up, he seems to think that the conversation is now being dominated by the commercially-minded proponents. That would certainly seem to be in line with Verizon’s move, which essentially implies that they’re resigning themselves to the current peering ecosystem and are going to compete directly for traffic rather than whining that the system is unfair (always disengenuous, given ILEC and MSO complicity in creating the current circumstances of that ecosystem).

I view arrangements that are reasonable from a financial and engineering standpoint, that do not seek to discriminate based on the nature of the actual content, to be the most positive interpretation of network neutrality. And so I’ll conclude by noting that I heard an interesting briefing today from Anagran, a hardware vendor offering flow-based traffic management (i.e., it doesn’t care what you’re doing, it’s just managing congestion). It’s being positioned as an alternative or supplement to Sandvine and the like, offering a way to try to keep P2P traffic manageable without having to do deep-packet inspection (and thus explicit discrimination).

Bookmark and Share

Amazon SimpleDB, plus a bit on cloud storage

Amazon SimpleDB is now in public beta. This database-as-a-service has been in private beta for some time, but what’s really noteworthy is that with the public beta, Amazon has dropped the price drastically, and the first 25 machine hours, 1 GB of storage, and 1 GB of transfer are free, meaning that it’s essentially free to experiment with.

On another Amazon-related note, my colleagues who cover storage have recently put out a research note titled, “A Look at Amazon’s S3 Cloud-Computing Storage Service“. If you’re a Gartner client contemplating use of S3, I’d suggest checking it out.

I want to stress something that’s probably not obvious from that note: You can’t mount S3 storage like a normal filesystem. You access it via its APIs, and that’s all. If you use EC2 and you need cloud storage that looks like a regular filesystem, you’ll want to use Amazon’s Elastic Block Store. If you’re using S3, whether within EC2 or from your own infrastructure, you’re either going to make API calls directly (which will make your apps dependent upon S3), or you’re going to have to have to go through a filesystem driver like Fuse (commercially, Subcloud).

Cloud storage, at this stage, is typically reliant upon proprietary APIs. Some providers are starting to offer filesystems, such as Nirvanix‘s CloudNAS (now in beta), but we’re at the very earliest stages of that. I suspect that the implementation hurdles created by API-only access, and not the contractual issues, will be what stop enterprises from adopting it in the near term.

On a final storage-related note, Rackspace (Mosso) Cloud Files remains in a definitively beta stage. I was playing with the shell I was writing (adding an FTP-like get and put with progress bars and such), and trying to figure out why my API calls were failing. It turned out that the service was in read-only mode for a while yesterday, and even read calls (via the API) were failing for a bit (returning 500 Internal Server Error codes). On the plus side, my real-time chat — Rackspace’s support via an instant-messaging-like interface — support request, which I made to report the read outage, was answered immediately, politely, and knowledgeably, one clear way that the Rackspace offering wins over S3. (Amazon charges for support.)

Bookmark and Share

Basic CDNs: where are my logs?

I’m now trialing three different basic CDNs, having each of them deliver exactly one image off the same front page of a website: Amazon’s CloudFront, Rackspace’s Cloud Files CDN (a Limelight Networks partnership), and SimpleCDN.

SimpleCDN has turned out to be the overwhelming winner in terms of initial set-up — you can simply point it at a site and it will mirror your content, just like you’ll find in more full-featured CDNs. The downside is that it has basically no additional controls — from what I can tell, you can’t even delete a mirror once you set it up, and there’s no way to explicitly mark content with a Cache-Control header or the like. It also wins on the sheer price of data transfer — just a hair under 6 cents per GB. The downside is that, at that price, it doesn’t come close come to touching Limelight’s footprint, or, for that matter, Amazon’s — there are no East Coast delivery locations, for instance. It’s also routed via AnyCast.

Cloud Files and SimpleCDN are both missing meaningful logging capabilities. Cloud Files can give me daily aggregate totals of disk space used, bandwidth in and out, and number of paid and free operations. (Just numbers. No graph.) SimpleCDN can give me aggregated traffic graphs (outbound bandwidth, hits per second, and hits per 15 minutes) for the past 2 days and the past 2 weeks, plus aggregate totals for the last 30 minutes, day, and week.

With Amazon, you get detailed usage reports, modulo the fact that they are indecipherably opaque. You can set up S3 to give you detailed server logs; I’m processing mine through S3stat, which is a service that will Webalizer your S3 logs for you. Amazon is promising such logs for CloudFront in the future. At the moment, I’m stuck with the enigmatic usage reports. Nothing I can find anywhere will tell me what the difference between a tier 1 and tier 2 request is, for instance. What I’m interested in finding out is what percentage of my requests end up falling back to the origin, but it looks like that is a mystery that will have to wait for CloudFront logging.

Lack of logging is not particularly a big deal if you are just trying to offload static content in the form of images and the like — presumably in that scenario you have decent analytics based off hits to the base page or a tracking counter or something like that. However, if you’re trying to track something like software downloads, it is certainly a much more significant problem. And importantly, logs let you verify exactly what is going on, which may be significant for troubleshooting as well as settling billing disputes.

Bookmark and Share

Quick CloudFront set-up

I’ve got a tiny low-traffic site that I’ve just set up to use Amazon’s CloudFront CDN. I chose to do it the trivial, painless way for now — through RightScale’s dashboard. It just took a couple of minutes, most of which was waiting for systems to recognize the changes. I frankly expect that my content will never be fresh in cache, but doing this will give me something to play with, without actually costing me some unpredictable amount of money. I’ve been meaning to do some similar tinkering with SimpleCDN, too, and eventually with the Rackspace/Limelight cloud CDN (which thus far lacks a snappy short name).

I still have to finish my cloud server testing, too, which I started a few months ago and which keeps being interrupted by other work and personal demands… I always feel a bit guilty keeping demo accounts for long periods of time.

Bookmark and Share