Do Re Mi

via Sam Ruby, Patrick Mueller's Exploration of Twitter. Sam's been doing the one note samba on ETags for a long time now, but clearly it requires some more repetition. I've been thinking lately that it's really gratifying to see discussion of Twitter's scaling problems. I remember a discussion with a co-worker a couple years ago, where he essentially rejected the notion that any O/R mapping framework, and Rails specifically, could ever scale to anything significant. Twitter is taking > 10,000 requests/sec, in spite of being their own worst enemy. I think that puts that question to rest (erm...). Imagine what they could do if they actually implemented ETags in a useful way, or if they had a layer of caching in place to take some of the load off of the database.

Another good point by Patrick: "There's absolutely no reason that Twitter shouldn't be using their own API in an AJAXy style application." Which brings around my point - when designing a service, you need to make sure that you've designed something that you can cache. My last major API design effort taught me this lesson; I think that I ran into a problem of conflating data that wasn't necessarily related, or rather, I was merging data sets that tended to change out of sync with each other. This made the problem of figuring out staleness a bit ugly. It was a solvable problem, but it could have been easier to solve. There's a problem with premature optimization, to be sure. But on the other hand, there's no excuse for not planning at all. Patrick mentions that the dynamic page generated by Twitter includes long, duplicated, inline scripts. Things like this are easy wins. It's the mantra of OO development, and it should be the mantra of all development: separate the things that change from the things that stay the same.

Incidentally, you could go farther with this. Mark Nottingham recently expanded on the cache hints, to go with his Cache Tutorial. I think that developers tend to be overly fearful of ending up with stale data, and they shy away from client side cache hints. Take your favicon.ico, for example. It certainly wouldn't kill you to have that cached client-side for 24 hours, or a week, or whatever - it costs almost nothing to send the header to the client. Yet, on my server, my favicon.ico comes in at just over 1K in size, and in a simple test I ran, it took a request of 392 bytes and a response of 129 bytes to tell my browser that a file that hasn't changed in years has in fact not changed again. favicon is a trivial example, but the point is that you likely have other files on your servers that you can certainly afford to have the client cache - in fact, it may turn out that you can't afford not to have the client cache them.

— Gordon Weakliem at permanent link