Category Archives: Infrastructure

DDoS season

We are, it seems, in the midst of a wave of distributed denial of service attacks. The victims include:

  • Neustar’s UltraDNS. (Problems with specific regional DNS clusters, with little customer-visible impact.)
  • Register.com. (Severe impact on Web hosting and email customers.)
  • GoGrid. (Severe impact on cloud hosting customers.)
  • ThePlanet. (Attack on their DNS servers, with severe impact on customers.)

The attack on ThePlanet is unusual in that it received minimal attention in the press, despite the company being one of the largest Web hosters, and having Cisco Guard (DDoS mitigation) appliances in place. Also, the status updates were eventually issued via Twitter, rather than a more expected form of customer communication. Here’s the full text, aggregated off Twitter:

Between 2:30am and 5:00am CDT on April 8, The Planet’s name servers were flooded again with a large brute force (DDoS) attack. Unlike the previous attack, this attack did not appear to be DNS-specific; instead, targeted resources indirectly supporting DNS services. Because the nature of this attack was different from the previous event, mirroring the response to the previous attack was ineffective. Once our investigation determined the nature of the attack, we applied filters throughout our DNS support system to alleviate the effects. The Planet’s network and DNS performance have been restored, and the attack originator has ceased actions. Any lingering issues may be indicative of a different problem that may have been exacerbated by the attack and should be resolved quickly. We are working on several projects to help mitigate similar attacks in the future. Once those plans are in order, we will update the DNS Status announcement thread in our community forums. We understand that other providers are experiencing similar events. We will reach out to them, pool our information and then work together to find consistencies between attacks. Our goal is to establish best practices as an industry to better respond to these recent events.

Jose Nazario of Arbor Networks claims these attacks are not Conficker at work, which makes this wave of attacks even more interesting.

The takeaway from this: Customers understand if you get DDoS’d. They don’t put up with a lack of communication. It’s enormously difficult to communicate with customers in the midst of a crisis, especially one that takes down customer-facing infrastructure in a customer-impacting way, but it’s also incredibly critical. Clearly, not everyone in the company is out trying to troubleshoot the problem, so you can usefully put them to work reaching out to your customers, if you have the policies and procedures in place to do so successfully.

Something to think about today, no matter who you are and who you work for: What policies do you have in place for customer communications when a crisis hits your company? (Book recommendation: Eric Dezenhall’s Damage Control, which is a hard-edged, realistic look at communication in a crisis, including coping with competitors who are deliberately fanning the negative-PR flames.)

Bookmark and Share

Google App Engine and other tidbits

As anticipated, Java support on Google App Engine has been announced. To date, GAE has supported only the Python programming language. In keeping with the “phenomenal cosmic power, itty bitty living space” sandboxing that’s become common to cloud execution environments, GAE/Java has all the restrictions of GAE/Python. However, the already containerized nature of Java applications means that the restrictions probably won’t feel as significant to developers. Many Python libraries and frameworks are not “pure Python”; they include C extensions for speed. Java libraries and frameworks are, by contrast, usually pure Java; the biggest issues for porting Java into the GAE environment are likely to be the restrictions on system calls and the lack of threads. Generically, GAE/Java offers servlets. The other things that developers are likely to miss are support for JMS and JMX (Java’s messaging and monitoring, respectively).

Overall, the Java introduction is a definite plus for GAE, and is presumably also an important internal proof point for them — a demonstration that GAE can scale and work with other languages. Also, because there are lots of languages that now target the Java virtual machine (i.e., they’ve got compilers/interpreters that produce byte code for the Java VM) — Clojure and Scala, for instance — as well as ports of other languages, like JRuby, we’ll likely see additional languages available on GAE ahead of Google’s own support for those environments.

Google also followed through on an earlier announcement, adding support for scheduld tasks (“cron”). Basically, at a scheduled time, GAE cron will invoke a URL that you specify. This is useful, but probably not everything people were hoping it would be. It’s still subject to GAE’s normal restrictions; this doesn’t let you invoke a long-running background process. It requires a shift in thinking — for instance, instead of doing the once-daily data cleanup run at 4 am, you ought to be doing cleanup throughout the day, every couple of minutes, a bit of your data set at a time.

All of that is going to be chewed over thoroughly by the press and blogosphere, and I’ve contributed my two cents to a soon-to-be-published Gartner take on the announcement and GAE itself, so now I’ll point out something that I don’t think has been widely noticed: the unladen-swallow project plan.

unladen-swallow is apparently an initiative within Google’s compiler optimization team, with a goal of achieving a 5x speed-up in CPython (i.e., the normal, mainstream, implementation of Python), starting from the 2.6 base (the current version, which is a transition point between the 2.5 used by App Engine, and the much-different Python 3.0). The developers intend to achieve this speed-up in part by moving from the existing custom VM to one built on top of LLVM. (I’ve mentioned Google’s interest in LLVM in the past.) I think this particular approach answers some of the mystery surrounding Google and Python 3.0 — this seems to indicate longer-term commitment to the existing 2.x base, while still being transition-friendly. As is typical with Google’s work with open-source code, they plan to release these changes back to the community.

All of which goes back to a point of mine earlier this week: Although programming language communities strongly resemble fandoms, languages are increasingly fungible. We’re a long way from platform maturity, too.

Bookmark and Share

Scala, Ruby, cost, and development trends

A recent interview of some Twitter developers, on Twitter’s use of Scala has touched off a fair amount of controversy in the Ruby community, and prompting Todd Hoff of the High Scalability to muse on an interesting statement: At some point, the cost of servers outweighs the cost of programmers.

We all know that the scripting languages that are frequently favored in Web development today — Ruby, Python, and PHP — do not perform as well as Java, and Java in turn can be outperformed by well-written native C/C++ code. However, these popular dynamic programming languages typically lead to better programmer productivity. The argument has been that it’s more cost-effective to have more productive developers, than it is to buy less infrastructure. There is a point, though, when that scale equation can be flipped on its head — when the cost of the servers, due to the performance sacrifices, gets too high. (I would add that you can’t look at simple hardware spend alone, either. You’ve got a infrastructure TCO to look at. It’s not just about more people to maintain more servers, either — that equation is not linear, as a sysadmin can manage more systems if they’re all identical and there are good automation tools. But systems that are struggling due to performance issues soak up operations time with daily firefighting.)

Twitter’s developers are not advocating that people abandon what they know and love, but they’re forging a new path for themselves, with an open-source language developed in academia. Scala can be compiled to either Java or .NET bytecode, allowing it to interoperate bidirectionally with Java and CLR code; this is important for driving adoption because programmers generally like to work with languages that have a solid base of libraries (i.e., someone else has conveniently done the work of producing code for commonly-needed capabilities), and because this makes it possible for Scala to leverage the existing tools community for Java and .NET. Scala’s equivalent of Rails, i.e., a convenient framework, is Lift.

Scala doesn’t have much adoption now, but it’s worth noting that the rapid pace of Web 2.0 innovation is capable of driving extremely fast uptake of things that turn out to solve real-world problems. (For comparison: Not long ago, practically no one had heard of Hadoop, either, but it’s built quite a bit of buzz now.) That’s important for anyone contemplating the long-term future of particular platforms, particularly APaaS offerings that are tied to specific programming languages. The favored platforms can and do change in a tidal fashion — just look at the Google trend graph for Ruby on Rails to see just how aggressively interest can increase over a single year (2005 to 2006).

As a coda to all of this, Twitter’s Alex Payne has a smart blog post, noting that social media fills the vacuum between peer-reviewed journals and water-cooler conversations, yet deploring the fact that in these mediums, emotion can rule over what is measurable. The takeaway — whether you’re an IT manager, a marketing manager at a vendor, or an investor — from my perspective, is this: There’s an emotional context to programming language choice. These are not merely technical communities; these are fandoms, and they form part of a developer’s self-identity.

Bookmark and Share

AWS in Eclipse, and Azure announcements

Amazon’s announcement for today, with timing presumably associated with EclipseCon, is an AWS toolkit for the Eclipse IDE.

Eclipse, which is an open-source project under the aegis of IBM (who also offers a commercial version), is one of the most popular IDEs (the other is Microsoft Visual Studio). Originally designed for Java applications, it has since been extended to support many other languages and environments.

Integrating with Eclipse is a useful step for Amazon, and hopefully other cloud providers will follow suit. It’s also a competitive response to the integration that Microsoft has done between Visual Studio and its Azure platform.

Speaking of Azure, as part of a set of announcements, Microsoft has said that it’s supporting non-.Net languages on Azure via FastCGI. FastCGI is a webserver extension that basically compiles and loads your scripts once, instead of every time they’re accessed, resulting in a reduction of computational overhead. You can run most languages under it, including Java, but it doesn’t really give you the full featureset that you get with tight integration with the webserver through a language-specific extension. (Note that because .NET’s languages encompass anything that supports the CLR, users already had some reasonable access to non-C# languages on Azure — implementations like Ruby.NET, IronRuby, IronPython, etc.)

Also, in an interesting Q&A on a ZDnet blog post, Microsoft said that there will be no private Azure-based clouds, i.e., enterprises won’t be able to take the Azure software and host it in their own data centers. What’s not clear is whether or not the software written for Azure will be portable into the enterprise environment. Portability of this sort is a feature that Microsoft, with its complete control over the entire stack, is uniquely well-positioned to be able to deliver.

Bookmark and Share

Gartner BCM summit pitches

I’ve just finished writing one of my presentations for Gartner’s Business Continuity Management Summit. My pitch is focused upon looking at colocation as well as the future of cloud infrastructure for disaster recovery purposes. (My other pitch at the conference is on network resiliency.)

When I started out to write this, I’d actually been expecting that some providers who had indicated that they’d have formal cloud DR services coming out shortly would be able to provide me with a briefing on what they were planning to offer. But that, unfortunately, turned out not to be the case in the end. So the pitch has been more focused on do-it-yourself cloud DR.

Lightweight DR services have appeared and disappeared from the market at an interesting rate ever since Inflow (many years and many acquisitions ago) began offering a service focused on smaller mid-market customers that couldn’t typically afford full-service DR solutions. It’s a natural complement to colocation (in fact, a substantial percentage of the people who use colo do it for a secondary site), and now, a natural complement to the cloud.

Bookmark and Share

Research du jour

My newest research notes are all collaborative efforts.

Forecast: Sizing the Cloud; Understanding the Opportunities in Cloud Services. This is Gartner’s official take on cloud segmentation and forecasting through 2013. It was a large-team effort; my contribution was primarily on the compute services portion.

Invest Insight: Content Delivery Network Arbitrage Increases Market Competition. This is a note specifically for Gartner Invest clients, written in conjunction with my colleague Frank Marsala (a former sell-side analyst who heads up our telecom sector for investors). It’s primarily about Conviva but also touches on Cotendo, but its key point is not to look at particular companies, but to look at technology-enabled long-term trends.

Cool Vendors in Cloud Computing Management and Professional Services, 2009. This is part of our annual “cool vendors” series highlighting small vendors whom we think are doing something notable. It’s a group effort, and we pick the vendors via committee. (And no, there is no way to buy your way into the report.) This year’s picks (never a secret, since vendors usually do press releases) are Appirio, CohesiveFT, Hyperic, RightScale, and Ylastic.

Bookmark and Share

Sun, IBM, and the cloud

The morning’s hot rumor: IBM and Sun are in acquisition talks. The punditry is in full swing in the press. My mailbox here at work is filling rapidly with research-community discussion of the implications, too. (As if Cisco’s Unified Computing Strategy wasn’t creating enough controversy for the week.)

Don’t let that buzz drown out Sun’s cloud announcement, though. An insider has useful detailed comments, along with links to the API itself. It’s Q-Layer inside, a RESTful API on top, and clearly in the early stages of development. I’ll likely post some further commentary once I get some time to read through all the documentation and think it through.

Bookmark and Share

Linkage du jour

Tossing a few links out there…

In the weekend’s biggest cloud news, Microsoft’s Azure was down for 22 hours. It’s now back up, with no root cause known.

Geva Perry has posted a useful Zoho Sheet calculator for figuring out whether an Amazon EC2 reserved instance will save you money over an unreserved instance.

Craig Balding has posted a down-to-earth dissection of PCI compliance in the cloud, and the practical reality that cloud infrastructure providers tend to deal with PCI compliance by encouraging you to push the actual payment stuff off to third parties.

Bookmark and Share

Google App Engine updates

For those of you who haven’t been following Google’s updates to App Engine, I want to call your attention to a number of recent announcements. At the six-month point of the beta, I asked when App Engine would be enterprise-ready; now, as we come to almost the year mark, these announcements show the progress and roadmap to addressing many of the issues I mentioned in my previous post.

Paid usage. Google is now letting applications grow beyond the free limits. You set quotas for various resources, and pay for what you use. I still have concerns about the quota model, but being able to bill for these services is an important step for Google. Google intends to be price-competitive with Amazon, but there’s an important difference — there’s still some free service. Google anticipates that the free quotas are enough to serve about five million page views. 5 MPVs is a lot; it pretty much means that if you’re willing to write to the platform, you can easily host your hobby project on it for free. For that matter, many enterprises don’t get 5 MPVs worth of hits on an individual Web app or site each month — it’s just that the platform restrictions are a barrier to mainstream adoption.

Less aggressive limits and fewer restrictions. Google has removed or reduced some limits and restrictions that were significant frustrations for developers.

Promised new features. Google has announced that it’s going to provide APIs for some vital bits of functionality that it doesn’t currently allow, like the ability to run scheduled jobs and background processes.

Release of Python 3.0. While there’s no word on how Google plans to manage the 3.0 transition for App Engine, it’s interesting to see how many Python contributors have been absorbed into Google.

Speaking personally, I like App Engine. Python is my strongest scripting language skill, so I prefer to write in it whenever possible. I also like Django, though I appreciate that Google’s framework is easier to get started with than Django (it’s very easy to crank out basic stuff). Like a lot of people, I’ve had trouble adjusting to the non-relational database, but that’s mostly a matter of programming practice. It is, however, clear that the platform is still in its early stages. (I once spent several hours of a weekend tearing my hair out at something that didn’t work, only to eventually find that it was a known bug in the engine.) But Google continues to work at improving it, and it’s worth keeping an eye on to see what it will eventually become. Just don’t expect it to be enterprise-ready this year.

Bookmark and Share

Amazon announces reserved instances

Amazon’s announcement du jour is “reserved instances” for EC2.

Basically, with a reserved instance, you pay an up-front non-refundable fee for a one-year term or a three-year term. That buys you a discount on the usage fee for that instance, during that period of time. Reserved instances are only available for Unix flavors (i.e., no Windows) and, at present, only in the US availability zones.

Let’s do some math to see what the cost savings turn out to be.

An Amazon small instance (1 virtual core equivalent to a 1.0-1.2 GHz 2007 Opteron or Xeon) is normally $0.10 per hour. Assuming 720 hours in a month, that’s $72 a month, or $864 per year, if you run that instance full-time.

Under the reserved instance pricing scheme, you pay $325 for a one-year term, then $0.03 per hour. That would be $21 per month, or $259 per year. Add in the reserve fee and you’re at $584 for the year, averaging out to $49 per month — a pretty nice cost savings.

On a three-year basis, unreserved would cost you $2,592; reserved, full-time, is a $500 one-time fee, and with usage, a grand total of $1277. Big savings over the base price, averaging out to $35 per month.

This is important because at the unreserved prices, on a three-year cash basis, it’s cheaper to just buy your own servers. At the reserved price, does that equation change?

Well, let’s see. Today, in a Dell PowerEdge R900 (a reasonably popular server for virtualized infrastructure), I can get a four-socket server populated with quad-cores for around $15,000. That’s sixteen Xeon cores clocking at more than 2 GHz. Call it $1000 per modern core; split up over a 3-year period, that’s about $28 per month. Cheaper than the reserved price, and much less than the unreserved price.

Now, this is a crude, hardware-only, three-year cash calculation, of course, and not a TCO calculation. But it shows that if you plan to run your servers full-time on Amazon, it’s not as cheap as you might think when you think “it’s just three cents an hour!”

Bookmark and Share