Skip to main content

Caching for AOL Journals

We're continuing to work on improving the scalability of the AOL Journals servers.  Our major problem recently has been traffic spikes to particular blog pages or entries caused by links on the AOL Welcome page.  Most of this peak traffic is from AOL clients using AOL connections, meaning they're using AOL caches, known as Traffic Servers.  Unfortunately, there was a small compatibility issue that prevented the Traffic Servers from caching pages served with only ETags.  That was resolved last week in a staged Traffic Server rollout.  Everything we've tested looks good so far; the Traffic Servers are correctly caching pages that can be cached, and not the ones that can't.  We're continuing to monitor things, and we'll see what happens during the next real traffic spike.

The good thing about this type of caching is that our servers are still notified about every request through validation requests, so we'll be able to track things fairly closely, and we're able to add caching without major changes to the pages.

The down side is of course that our servers can still get hammered if these validation requests themselves go up enough.  This is actually the case even for time-based caching if you get enough traffic; you still have to scale up your origin server farm to accommodate peak traffic, because caches don't really guarantee they won't pass through traffic peaks. 

We're continuing to carefully add more caching to the system while monitoring things.  We're currently evaluating using a Squid reverse caching proxy.  It's open source and has been used in other situations -- and it would give us a lot of flexibility in our caching strategy.  We're also making modifications to our databases to both distribute load and take advantage of in-memory caching, of course.  But we can scale much higher and more cheaply by pushing caching out to the front end and avoiding any work at all in the 80% case where it's not really needed.

Tags: , , , , , , ,

Popular posts from this blog

Personal Web Discovery (aka Webfinger)

There's a particular discovery problem for open and distributed protocols such as OpenID, OAuth, Portable Contacts, Activity Streams, and OpenSocial.  It seems like a trivial problem, but it's one of the stumbling blocks that slows mass adoption.  We need to fix it.  So first, I'm going to name it:

The Personal Web Discovery Problem:  Given a person, how do I find out what services that person uses?
This does sound trivial, doesn't it?  And it is easy as long as you're service-centric; if you're building on top of social network X, there is no discovery problem, or at least only a trivial one that can be solved with proprietary APIs.  But what if you want to build on top of X,Y, and Z?  Well, you write code to make the user log in to each one so you can call those proprietary APIs... which means the user has to tell you their identity (and probably password) on each one... and the user has already clicked the Back button because this is complicated and annoying.

The problem with creation date metadata in PDF documents

Last night Rachel Maddow talked about an apparently fake NSA document "leaked" to her organization.  There's a lot of info there, I suggest you listen to the whole thing:

http://www.msnbc.com/rachel-maddow/watch/maddow-to-news-orgs-heads-up-for-hoaxes-985491523709

There's a lot to unpack there but it looks like somebody tried to fool MSNBC into running with a fake accusation based on faked NSA documents, apparently based on cloning the document the Intercept published back on 6/5/2017, which to all appearances was itself a real NSA document in PDF form.

I think the main thrust of this story is chilling and really important to get straight -- some person or persons unknown is sending forged PDFs to news organization(s), apparently trying to get them to run stories based on forged documents.  And I completely agree with Maddow that she was right to send up a "signal flare" to all the news organizations to look out for forgeries.  Really, really, really import…
Twister is interesting.  It's a decentralized "microblogging" system based on putting together existing protocols:  Bitcoin, distributed hash tables, and Bittorrent.  The most interesting part for me is using Bitcoin for user registration and spam control.  Federated systems handle this with federated trust, which is at least conceptually simple.  The Twister/Bitcoin mechanism looks intriguing though I don't know enough about Bitcoin to really comment.  Need to read further.