There are two primary ecosystems developing in the world: VMware and Amazon. Other possibilities, like Microsoft and OpenStack, are completely secondary to those two. You can think of VMware as “cloud-out” and Amazon as “cloud-in” approaches.
In the VMware world, you move your data center (with its legacy applications) into the modern era with virtualization, and then you build a private cloud on top of that virtualized infrastructure; to get additional capacity, business agility, and so forth, you add external cloud IaaS, and hopefully do so with a VMware-virtualized provider (and, they hope, specifically a vCloud provider who has adopted the stack all the way up to vCloud Director).
In the Amazon world, you build and launch new applications directly onto cloud IaaS. Then, as you get to scale and a significant amount of steady-state capacity, you pull workloads back into your own data center, where you have Amazon-API-compatible infrastructure. Because you have a common API and set of tools across both, where to place your workloads is largely a matter of economics (assuming that you’re not using AWS capabilities beyond EC2, S3, and EBS). You can develop and test internally or externally, though if you intend to run production on AWS, you have to take its availability and performance characteristics into account when you do your application architecture. You might also adopt this strategy for disaster recovery.
While CloudStack has been an important CMP option for service providers — notably competing against the vCloud stack, OnApp, Hexagrid, and OpenStack — in the end, these providers are almost a decoration to the Amazon ecosystem. They’re mostly successful competing in places that Amazon doesn’t play — in countries where Amazon doesn’t have a data center, in the managed services / hosting space, in the hypervisor-neutral space (Amazon-style clouds built on top of VMware’s hypervisor, more specifically), and in a higher-performance, higher-availability market.
Where CloudStack has been more interesting is in its use to be a “cloud-in” platform for organizations who are using AWS in a significant fashion, and who want their own private cloud that’s compatible with it. Eucalyptus fills this niche as well, although Eucalyptus customers tend to be smaller and Eucalyptus tends to compete in the general private-cloud-builder CMP space targeted at enterprises — against the vCloud stack, Abiquo, HP CloudSystem, BMC Cloud Lifecycle Manager, CA’s 3Tera AppLogic, and so on. CloudStack tends to be used by bigger organizations; while it’s in the general CMP competitive space, enterprises that evaluate it are more likely to be also evaluating, say, Nimbula and OpenStack.
CloudStack has firmly aligned itself with the Amazon ecosystem. But OpenStack is an interesting case of an organization caught in the middle. Its service provider supporters are fundamentally interested in competing against AWS (far more so than with the VMware-based cloud providers, at least in terms of whatever service they’re building on top of OpenStack). Many of its vendor contributors are afraid of a VMware-centric world (especially as VMware moves from virtualizing compute to also virtualizing storage and networks), but just as importantly they’re afraid of a world in which AWS becomes the primary way that businesses buy infrastructure. It is to their advantage to have at least one additional successful widely-adopted CMP in the market, and at least one service provider successfully competing strongly against AWS. Yet AWS has established itself as a de facto standard for cloud APIs and for the way that a service “should” be designed. (This is why OpenStack has an aptly named “Nova Feature Parity Team” playing catch-up to AWS, after all, and why debates about the API continue in the OpenStack community.)
But make no mistake about it. This is not about scrappy free open-source upstarts trying to upset an established vendor ecosystem. This is a war between vendors. As Simon Wardley put it, beware of geeks bearing gifts. CloudStack is Citrix’s effort to take on VMware and enlist the rest of the vendor community in doing so. OpenStack is an effort on the part of multiple vendors — notably Rackspace and HP — to pool their engineering efforts in order to take on Amazon. There’s no altruism here, and it’s not coincidental that the committers to the projects have an explicit and direct commercial interest — they are people working full-time for vendors, contributing as employees of those vendors, and by and large not individuals contributing for fun.
So it really comes down to this: Who can innovate more quickly, and choose the right ways to innovate that will drive customer adoption?
Ladies and gentlemen, place your bets.
I wrote, the other day, about Citrix buying Cloud.com, and I realized I forgot to make an important point about OpenStack versus the various commercial vendors vying for the cloud-building market; it’s worthy of a post on its own.
OpenStack is designed by the community, which is to say that it’s largely designed by committee, with some leadership that represents, at least in theory, the interests of the community and has some kind of coherent plan in mind. It is implemented by the community, which means that people who want to contribute simply do so. If you want something in OpenStack, you can write it and hope that your patches are included, but there’s no guarantee. If the community decides something should be included in OpenStack, they need some committers to agree to actually write it, and hope that they implement it well and do it in some kind of reasonable timeframe.
This is not the way that one normally deals with software vendors, of course. If you’re a potentially large customer and you’d like to use Product X but it doesn’t contain Feature Y that’s really important to you, you can normally say to the vendor, “I will buy X if you had Y within Z timeframe,” and you can even write that into your contract (usually witholding payment and/or preventing the vendor from recognizing the revenue until they do it).
But if you’re a potentially large customer that would happily adopt OpenStack if it just had Feature Y, you have miminal recourse. You probably don’t actually want to write Feature Y yourself, and even if you did, you would have no guarantee that you wouldn’t be maintaining a fork of the code; ditto if you paid some commercial entity (like one of the various ventures that do OpenStack consulting). You could try getting Feature Y through the community process, but that doesn’t really operate on the timeframe of business, nor have any guarantees that it’ll be successful, and also requires you to engage with the community in a way that you may have no interest in doing. And even if you do get it into the general design, you have no control over implementation timeframe. So that’s not really doable for a business that would like to work with a schedule.
There are a growing number of OpenStack startups that aim to offer commercial distributions with proprietary features on top of the community OpenStack core, including Nebula and Piston (by Chris Kemp and Joshua McKenty, respectively, and funded by Kleiner Perkins and Hummer Winblad, respectively, two VCs who usually don’t make dumb bets). Commercial entities, of course, can deal with this “I need to respond to customer needs more promptly than the open source community can manage” requirement.
There are many, many entitities, globally, telling us that they want to offer a commercial OpenStack distribution. Most of these are not significant forks per se (although some plan to fork entirely), but rather plans to pick a particular version of the open source codebase and work from there, in order to try to achieve code stability as well as add whatever proprietary features are their secret source. Over time, that can easily accrete into a fork, especially because the proprietary stuff can very easily clash with whatever becomes part of OpenStack’s own core, given how early OpenStack is in its evolution.
Importantly, OpenStack flavors are probably not going to be like Linux distributions. Linux distributions differ mostly in which package manager they use, what packages are installed by default, and the desktop environment config out of the box — almost cosmetic differences, although there can be non-cosmetic ones (such as when things like virtualization technologies were supported). Successful OpenStack commercial ventures need to provide significant value-add and complete solutions, which, especially in the near term when OpenStack is still a fledgling immature project, will result in a fragmentation of what features can be expected out of a cloud running OpenStack, and possibly significant differences in the implementation of critical underlying functionality.
I predict most service providers will pick commercial software, whether in the form of VMware, Cloud.com, or some commercial distribution of OpenStack. Ditto most businesses making use of cloud stack software to do something significant. But the commercial landscape of OpenStack may turn out to be confusing and crowded.
Rackspace is open-sourcing its cloud software — Cloud Files and Cloud Servers — and merging its codebase and roadmap with NASA’s Nebula project (not to be confused with OpenNebula), in order to form a broader community project called OpenStack. This will be hypervisor-neutral, and initially supports Xen (which Rackspace uses) and KVM (which NASA uses), and there’s a fairly broad set of vendors who have committed to contributing to the stack or integrating with it.
While my colleagues and I intend to write a full-fledged research note on this, I feel like shooting from the hip on my blog, since the research note will take a while to get done.
I’ve said before that hosters have traditionally been integrators, not developers of technology, yet the cloud, with its strong emphasis on automation, and its status as an emerging technology without true turnkey solutions at this stage, has forced hosters into becoming developers.
I think the decision to open-source its cloud stack reinforces Rackspace’s market positioning as a services company, and not a software company — whereas many of its cloud competitors have defined themselves as software companies (Amazon, GoGrid, and Joyent, notably).
At the same time, open sourcing is not necessarily a way to software success. Rackspace has a whole host of new challenges that it will have to meet. First, it must ensure that the roadmap of the new project aligns sufficiently with its own needs, since it has decided that it will use the project’s public codebase for its own service. Second, it now has to manage and just as importantly, lead, an open-source community, getting useful commits from outside contributors and managing the commit process. (Rackspace and NASA have formed a board for governance of the project, on which they have multiple seats but are in the minority.) Third, as with all such things, there are potential code-quality issues, the impact of which become significantly magnified when running operations at massive scale.
In general, though, this move is indicative of the struggle that the hosting industry is going through right now. VMware’s price point is too high, it’ll become even higher for those who want to adopt “Redwood” (vCloud), and the initial vCloud release is not a true turnkey service provider solution. This is forcing everyone into looking at alternatives, which will potentially threaten VMware’s ability to dominate the future of cloud IaaS. The compelling value proposition of single pane of glass management for hybrid clouds is the key argument for having VMware both in the enterprise and in outsourced clouds; if the service providers don’t enthusiastically embrace this technology (something which is increasingly threatening), the single pane of glass management will go to a vendor other than VMware, probably someone hypervisor-neutral. Citrix, with its recent moves to be much more service provider friendly, is in a good position to benefit from this. So are hypervisor-neutral cloud management software vendors, like Cloud.com.
Back in 2002, Yahoo acquired Inktomi, a struggling software vendor whose fortunes had turned unpleasantly with the dot-com crash. While at the time of the acquisition, Inktomi had refocused its efforts upon search, its original flagship product — the one that really drove its early revenue growth — was something called Traffic Server.
Traffic Server was a Web proxy server — essentially, software for running big caches. It delivered significantly greater scalability, stability, and maintainability than did the most commonly-used alternative, the open-source Squid. It was a great piece of software; at one point in time, I was one of Inktomi’s largest customers (possibly the actual largest customer), with several hundred Traffic Servers deployed in production globally, so I speak from experience, here. (This was as ISP caches, as opposed to the way that Yahoo uses it, which is a front-end, “reverse proxy” cache.)
Now, as ghosts of the dot-com era resurface, Yahoo is open-sourcing Traffic Server. This is a boon not only to Web sites that need high scalability, but also to organizations who need inexpensive, high-performance proxies for their networks, as well as low-end CDNs whose technology is still Squid-based. There are now enterprise competitors in this space (such as Blue Coat Systems), but open-source remains a lure for many seeking low-cost alternatives. Moreover, service providers and content providers have different needs from the enterprise.
This open-sourcing is only to Yahoo’s benefit. It’s not a core piece of technology, there are plenty of technology alternatives available already, and by opening up the source code to the community, they’re reasonably likely to attract active development at a pace beyond what they could invest in internally.
As anticipated, Java support on Google App Engine has been announced. To date, GAE has supported only the Python programming language. In keeping with the “phenomenal cosmic power, itty bitty living space” sandboxing that’s become common to cloud execution environments, GAE/Java has all the restrictions of GAE/Python. However, the already containerized nature of Java applications means that the restrictions probably won’t feel as significant to developers. Many Python libraries and frameworks are not “pure Python”; they include C extensions for speed. Java libraries and frameworks are, by contrast, usually pure Java; the biggest issues for porting Java into the GAE environment are likely to be the restrictions on system calls and the lack of threads. Generically, GAE/Java offers servlets. The other things that developers are likely to miss are support for JMS and JMX (Java’s messaging and monitoring, respectively).
Overall, the Java introduction is a definite plus for GAE, and is presumably also an important internal proof point for them — a demonstration that GAE can scale and work with other languages. Also, because there are lots of languages that now target the Java virtual machine (i.e., they’ve got compilers/interpreters that produce byte code for the Java VM) — Clojure and Scala, for instance — as well as ports of other languages, like JRuby, we’ll likely see additional languages available on GAE ahead of Google’s own support for those environments.
Google also followed through on an earlier announcement, adding support for scheduld tasks (“cron”). Basically, at a scheduled time, GAE cron will invoke a URL that you specify. This is useful, but probably not everything people were hoping it would be. It’s still subject to GAE’s normal restrictions; this doesn’t let you invoke a long-running background process. It requires a shift in thinking — for instance, instead of doing the once-daily data cleanup run at 4 am, you ought to be doing cleanup throughout the day, every couple of minutes, a bit of your data set at a time.
All of that is going to be chewed over thoroughly by the press and blogosphere, and I’ve contributed my two cents to a soon-to-be-published Gartner take on the announcement and GAE itself, so now I’ll point out something that I don’t think has been widely noticed: the unladen-swallow project plan.
unladen-swallow is apparently an initiative within Google’s compiler optimization team, with a goal of achieving a 5x speed-up in CPython (i.e., the normal, mainstream, implementation of Python), starting from the 2.6 base (the current version, which is a transition point between the 2.5 used by App Engine, and the much-different Python 3.0). The developers intend to achieve this speed-up in part by moving from the existing custom VM to one built on top of LLVM. (I’ve mentioned Google’s interest in LLVM in the past.) I think this particular approach answers some of the mystery surrounding Google and Python 3.0 — this seems to indicate longer-term commitment to the existing 2.x base, while still being transition-friendly. As is typical with Google’s work with open-source code, they plan to release these changes back to the community.
All of which goes back to a point of mine earlier this week: Although programming language communities strongly resemble fandoms, languages are increasingly fungible. We’re a long way from platform maturity, too.
SourceForge puzzles me. I think it’s the combination of what is obviously eager effort to improve the site, and the fumbling to get the basics right.
On the plus side, SourceForge recently made a very welcome addition — adding “hosted apps”, including WordPress and MediaWiki — as an option for all projects, for free. And the announcement of support for additional repository types, notably git, is also a nice move.
But SourceForge is plagued by sluggish response (which is especially stark when compared to the consistent zippiness of Google Code) — across its website, source code repositories, etc. — as well as occasional outages. And the continual redesign of the site, especially in its current bright-orange incarnation, hasn’t seemed like a positive to me. With every redesign, I’ve felt like SourceForge was becoming harder and harder to use. As an example, one redesign ago, the Project Admin menu got so long it was basically unusable on smaller screens (like laptops). To SourceForge’s credit, the next iteration promptly fixed it; unfortunately, the chosen fix was by burying vitally important functionality like the file release system under the “Feature Settings” page (found under Project Admin). That led me on a wild hunt through most of the UI before I finally stumbled upon it the functionality I was looking for by accident.
SourceForge offers a tremendous amount of functionality for free, which is what’s allowing it to stay dominant against the proliferating number of alternative services out there. But not only does SourceForge need to innovate, it needs to make sure that it gets the basics right. It has to add functionality while still being fast and simple to use, and over the years, SourceForge seems to have grown tendrils of new features while the main octopod body has grown sessile and mottled with confusion.
People occasionally ask me why busy, highly-skilled, highly-compensated programmers freely donate their time to open-source projects. In the past, I’ve nattered about the satisfaction of sharing with the community, the pleasure of programming as a hobby even if you do it for your day job, the “just make it work” attitude that often prevails among techies, altruism, idealism, the musings of people like Linus Torvalds, or research like the Lakhni and Wolf MIT/BCG study of developer motivation. (Speaking for myself, I code to solve problems, and I am naturally inclined to share what I do with others, and derive pleasure from having it be useful to others. The times I’ve written code for a living, I’ve always been lucky to have employers who were willing to let me open-source anything which wasn’t company-specific.)
But a chapter in Dan Ariely’s book Predictably Irrational got me thinking about a simpler way to explain it: Programmers contribute to free software projects for reasons that are similar to the reasons why lawyers do pro bono work.
The book posits that exchanges follow either social norms or market norms. If it’s a market exchange, we think in terms of money. If it’s a social exchange, we think in terms of human benefits. It’s the difference between a gift and a payment. Mentioning money (“a gift worth $10″) immediate transforms something into a market exchange. The book cites the example of lawyers being asked to do pro bono work — offered $30/hour to help needy clients, they refused, but asked to do it for free, there were plenty of takers. The $30/hour was viewed through the mental lens of a market exchange, mentally compared to their usual fees and deemed not worthwhile. Doing it for free, on the other hand, was viewed as a social exchange, evaluated on an entirely separate basis than the dollar value.
Contributing to free software follows the norms of the social exchange. The normative difference is also interesting in light of Richard Stallman’s assertion of the non-equivalence of “free software” and “open source”, and some of the philosophical debates that simmer in the background of the open-source movement; Stallman’s “free software” philosophy is intricately tied into the social community of software development.
The book also notes that issues occur when one tries to mix social norms and market norms. For instance, if you ask a friend to help you move, but he’s volunteering his time alongside paid commercial movers, that’s generally going to be seen as socially unacceptable. Commercial open-source projects conflate these two things all the time — which may go far to explaining why few commercialy-started projects gain much of a committer base beyond the core organizations and developers who care and are paid to do so (either directly, or indirectly via an end-user organization that makes heavy use of that software).
(Edit: I just discovered that Ariely has actually done an interview on open source, in quite some depth.)