Blog Archives
Software and thick vs. thin-slice computing
I’ve been thinking about the way that the economics of cloud computing infrastructure will impact the way people write applications.
Most of the cloud infrastructure providers out there offer virtual servers as a slice of some larger, physical server; Amazon EC2, GoGrid, Joyent, Terremark Enterprise Cloud, etc. all follow this model. This is in contrast to the abstracted cloud platform provided by Google App Engine or Mosso, which provide arbitrary, unsliced amounts of compute.
The virtual server providers typically provide thin slices — often single cores with 1 to 2 GB of RAM. EC2’s largest available slices are 4 virtual cores plus 15 GB, or 8 virtual cores plus 7 GB, for about $720/month. Joyent’s largest slice is 8 cores with 32 GB, for about $3300/month (including some data transfer). But on the scale of today’s servers, these aren’t very thick slices of compute, and the prices don’t scale linearly — thin slices are much cheaper than thick slices for the same total aggregate amount of compute.
The abstracted platforms are oriented around thin-slice compute, as well, at least from the perspective of desired application behavior. You can see this in the limitations imposed by Google App Engine; they don’t want you to work with large blobs of data nor do they want you consuming significant chunks of compute.
Now, in that context, contemplate this Intel article: “Kx – Software Which Uses Every Available Core“. In brief, Kx is a real-time database company; they process extremely large datasets, in-memory, parallelized across multiple cores. Their primary customers are financial services companies, who use it to do quantitative analysis on market data. It’s the kind of software whose efficiency increases with the thickness of the available slice of compute.
In the article, Intel laments the lack of software that truly takes advantage of multi-core architectures. But cloud economics are going to push people away from thick-sliced compute — away from apps that are most efficient when given more cores and more RAM. Cloud economics push people towards thin slices, and therefore applications whose performance does not suffer notably as the app gets shuffled from core to core (which hurts cache performance), or when limited to a low number of cores. So chances are that Intel is not going to get its wish.
Oracle in the cloud… sort of
Today’s keynote at Oracle World mentioned that Oracle’s coming to Amazon’s EC2 cloud.
The bottom line is that you can now get some Oracle products, including the Oracle 11g database software, bundled as AMIs (Amazon machine images) for EC2 — i.e., ready-to-deploy — and you can license these products to run in the cloud. Any sysadmin who has ever personally gone through the pain of trying to install an Oracle database from scratch knows how frustrating it can be; I’m curious how much the task has or hasn’t been simplified by the ready-to-run AMIs.
On the plus side, this is going to address the needs of those companies who simply want to move apps into the cloud, without changing much if anything about their architecture and middleware. And it might make a convenient development and testing platform.
But simply putting a database on cloud infrastructure doesn’t make it make it a cloud database. Without that crucial distinction, what are the compelling economics or business value-add? It’s cool, but I’m having difficulty thinking of circumstances under which I would tell a client, yes, you should host your production Oracle database on EC2, rather than getting a flexible utility hosting contract with someone like Terremark, AT&T, or Savvis.
The Cloud skills shift
Something that I’ve been thinking about: The shift to global-class computing, and massively scalable infrastructure, represents a fundamental shift in the skill sets that will be valued in IT Operations.
Those of you who, like myself, have worked at service providers in hyper-growth mode, are already familiar with what occurs when you need to grow at red-shift speeds: You automate everything humanly possible, and you try to standardize the heck out of things. Usually you end up trying to make sure that your infrastructure is horizontally scalable, and that your hardware is as interchangeable as possible, alllowing any single server to fail and the system as a whole to go chugging along, while you eventually go yank that server out and replace it with another just-as-generic box that you’ve auto-provisioned.
The shift to the cloud model, whether public or private, basically pushes the idea that every IT organization does that, either in-house or through the services of a provider. It puts the premium on software development / scripting skills — these are the guys who automate things and who write the glue for integrating your toolsets. You’ll have a handful of guys who are your serious architects — the guys who tune and optimize your hardware and storage, design your configurations, and so on. (That might be a single guru, or you might go to consultants for that, alternatively.) You’ll have a few folks who know the operational ins-and-outs of troubleshooting your applications. Everyone else becomes a hardware monkey, entry-level folks who don’t need much more of a skillset than it takes to assemble a PC from parts.
This is writ large in the Google model of Operations, but it’s been true for the last decade in every dot-com of significant size, too. Your hardware operations guys are rack-and-stack types. Everyone else blends systems administration with scripting abilities, and because your toolsets have to scale and be highly maintainable, this is scripting that has the air of serious development, not a one-time thing that can be banged out unreadably.
The routine drudgery of IT Operations is going to get automated away, bit by bit. Right now, many enterprises still operate at a scale and lack of standardization that means that it’s not necessarily more efficient to automate a task than simply do it manually. In the cloud model, the balance tips to the automation side, and the basic value of “I can wrangle boxes” declines precipitously.
My advice to sysadmins: If you are not fluent in a scripting language, and/or not capable of writing structured, readable, maintainable, non-hackish code, now is the time to learn.