Basic CDNs: where are my logs?
I’m now trialing three different basic CDNs, having each of them deliver exactly one image off the same front page of a website: Amazon’s CloudFront, Rackspace’s Cloud Files CDN (a Limelight Networks partnership), and SimpleCDN.
SimpleCDN has turned out to be the overwhelming winner in terms of initial set-up — you can simply point it at a site and it will mirror your content, just like you’ll find in more full-featured CDNs. The downside is that it has basically no additional controls — from what I can tell, you can’t even delete a mirror once you set it up, and there’s no way to explicitly mark content with a Cache-Control header or the like. It also wins on the sheer price of data transfer — just a hair under 6 cents per GB. The downside is that, at that price, it doesn’t come close come to touching Limelight’s footprint, or, for that matter, Amazon’s — there are no East Coast delivery locations, for instance. It’s also routed via AnyCast.
Cloud Files and SimpleCDN are both missing meaningful logging capabilities. Cloud Files can give me daily aggregate totals of disk space used, bandwidth in and out, and number of paid and free operations. (Just numbers. No graph.) SimpleCDN can give me aggregated traffic graphs (outbound bandwidth, hits per second, and hits per 15 minutes) for the past 2 days and the past 2 weeks, plus aggregate totals for the last 30 minutes, day, and week.
With Amazon, you get detailed usage reports, modulo the fact that they are indecipherably opaque. You can set up S3 to give you detailed server logs; I’m processing mine through S3stat, which is a service that will Webalizer your S3 logs for you. Amazon is promising such logs for CloudFront in the future. At the moment, I’m stuck with the enigmatic usage reports. Nothing I can find anywhere will tell me what the difference between a tier 1 and tier 2 request is, for instance. What I’m interested in finding out is what percentage of my requests end up falling back to the origin, but it looks like that is a mystery that will have to wait for CloudFront logging.
Lack of logging is not particularly a big deal if you are just trying to offload static content in the form of images and the like — presumably in that scenario you have decent analytics based off hits to the base page or a tracking counter or something like that. However, if you’re trying to track something like software downloads, it is certainly a much more significant problem. And importantly, logs let you verify exactly what is going on, which may be significant for troubleshooting as well as settling billing disputes.