2006/09/28

On Magic

We discovered an interesting IE6 feature when we pushed out caching changes earlier this week for Journals.  For posterity, here are the technical details.  Our R8 code started using "conditional GET" caching, meaning that we supported both If-Modified-Since: and If-None-Match: HTTP headers.  The way this works is that, if a client has a version of a page in its cache, it can send one or both of these headers to our servers.  Like this:
If-Modified-Since: Tue, 26 Sep 2006 21:47:18 GMT
If-None-Match: "1159307238000-ow:c=2303"
If-None-Match, which passes an "entity tag" or ETag, is better to use and was designed to replace the If-Modified-Since header.   (If-Modified-Since has granularity only down to a second, and can't  be used to indicate non-time-based changes.)  In our case we actually have two versions of our pages which can be served up, one for viewers and another one for owners.  We really only want to cache the viewers' page.

When our server sees a request like the one above, it first does a quick check (in this case it'll ignore the If-Modified-Since and use the ETag) to see if the client already has the latest version; if it does, it returns a 304 Not Modified result.  The big win is that this can be done very very quickly and efficiently, while building a 200KB web page takes lots of work.  If the client doesn't have the right version, though, the server returns a 200 and sends new headers, like these:

Last-Modified: Tue, 26 Sep 2006 21:47:18 GMT
Etag: "1159307238000-c=2303"

If you're obsessive with details you might notice that the modification date is the same as before, but the ETag has changed (the -ow:c has changed to a -c).  When the second request was made, it sent cookies that told the server that the user was the owner of the blog.  So the page is different and therefore the ETag is different, but the last date modified is the same.  We're expecting browsers and caches to detect the change and refresh the page.

This all works fine... except for IE6 (and the AOL client, which uses IE6 under the hood).  IE6 seems to see the Last-Modified: timestamp above and simply stop, ignoring the Etag: header and the fact that we're returning a 200 response with new content.  I've sat and watched the data flow in and out of my Internet connection and verified that IE just drops the 60K or so of content on the floor, as well as the new ETag, and re-uses its old version.  The only way to prevent it is to force a reload using ctrl-Reload, or clearing your Temporary Internet Files.

What this means is that if you change "who you are" by logging in or out, and nothing else changes, you will get a stale, cached version of your own blog's page.  Which is certainly not good.

As of this morning, we're running with caching turned off on journals.aol.com but with a bug fix on beta.journals.aol.com.  The bug fix is simple:  Don't send Last-Modified: headers.  So we only send back the Etag:
Etag: "1159307238000-c=2303"
Which forces IE6 to pay attention to it, fixing the problem with IE6.  IE7, by the way, works either way; go Microsoft!

This all means that we're not going to try to enable caching for non-Etag-aware clients and caches.  Since non-Etag-aware seems to pretty much equate to old or buggy, and not having caching is just a minor performance hit, this seems to be a pretty reasonable approach in theory.  The question now is, will practice accord with theory?  We really need people to hammer on http://beta.journals.aol.com/yourscreenname over the next few days and give us feedback.  See Stephanie's post: Like Magic, We're Back Where We Began... and please leave us feedback!