2004/08/01

Network Protocols and Vectorization

Doing things in parallel is one of the older performance tricks.  Vector SIMD machines -- like the Cray supercomputers -- attack problems that benefit from doing the same thing to lotsof different pieces of data simultaneously.  It's just a performancetrick, but it drove the design and even the physical shape of thosemachines because the problems they're trying to tackle -- airflowsimulation, weather prediction, nuclear explosion simulation, etc. --are both important and difficult to scale up.  (More recently, we'reseeing massively parallel machines built out of individual commodityPCs; conceptually the same, but limited mostly by networklatency/bandwidth.)

So what does this have to do with network protocols?  Just as the problems of doing things like a matrix-vector multiply very, very fast drove the designs of supercomputers, the problems of moving data from one place to another very quickly, on demanddrive the designs of today's network services.  The designs of networkAPIs (whether REST, SOAP, XML-RPC, or whatever) need to take thesedemands into account.

In particular, transferring lots of small pieces of data in serialfashion over a network can be a big problem.  Lots of protocols thatare perfectly fine when run locally or over a LAN fail miserably whenexpected to deal with 100-200ms latencies on a WAN or the Internet. HTTP does a decent job of balancing out performance/latency issues forretrieving human readable pages -- a page comes down as a medium-sizedchunk of data, followed by, if necessary, associated resources such asscripts, style sheets, and binary images, which can all be retrieved inparallel/behind the scenes.  Note, that this is achieved only throughlots of work on the client side and deep knowledge of the interactionsbetween HTML, HTTP, and the final UI.  The tradeoff is complexity ofprotocol and implementation.

How does this apply to network protocols in general?  One idea is tocarefully scrutinize protocol requests that transfer a single smallpiece of data.  Often a single small piece of data isn't very useful onits own.  Are there common use cases where a system will do this in aloop, perhaps serially, to get enough data to process or present to auser?  If so, perhaps it would be a good idea to think of "vectorizing"that part of the protocol.  Instead of returning a single piece ofdata, for example, return a variable-length collection of those piecesof data.  The semantics of the request may change only slightly -- from"I return an X" to "I return a set of X".  Ideally, the length shouldbe dynamic and the client should be able to ask for "no more than N" oneach request.

For example, imagine a protocol that requires a client to firstretrieve a set of handles (say, mailboxes for a user) then query eachone in turn to get some data (say, the number of unread messages).  Ifthis is something that happens often -- for example, automaticallyevery two minutes -- there are going to be a lot of packets hittingservers.  If multiple mailboxes are on one server, it would be fairlytrivial to vectorize the second call and effectively combine the twoqueries into one -- call it "get mailbox state(s)".  This would let aclient retrieve the state for all mailboxes on a given server, withbetter latency and far less bandwidth than the first option.  Of coursethere's no free lunch; if a client is dealing with multiple servers, itnow has to group the mailboxes for each server for purposes ofretrieving state.  But conceptually, it's not too huge of a leap.

There are other trade-offs.  If the "extra" data is large -- like abinary image -- it might well be better to download it separately,perhaps in parallel with other things.  If it's cacheable, but the maindata isn't, it may again be better to separate it out so you can takeadvantage of things like HTTP caching. 

To summarize, one might want to vectorize part of a network protocol if:
  • Performance is important, and network latency is high and/or variable;
  • The data to be vectorized are always or often needed together in common use cases;
  • It doesn't over-complexify the protocol;
  • There's no other way to achieve similar performance in other ways (parallel requests, caching, etc.)
Of course, this applies to the Atom API. There's a fair amount of vectorization in the Atom API from the start,since it's designed to deal with feeds as collections of entries.  Ithink there's a strong use case for being able to deal with collectionsof feeds as part of the Atom API as well, for all the reasons givenabove.  Said collections of feeds might be feeds I publish (so I wantto know about things like recent comments...) or perhaps feeds I'mtracking (so I want to be able to quickly determine which feeds havesomething interesting, before downloading all of the most recentdata).  It would be interesting to model this information as a synthetic feed, since of course that's already nicely vectorized.  But there are plenty of other ways to achieve the same result.

Suspended by the Baby Boss at Twitter

Well!  I'm now suspended from Twitter for stating that Elon's jet was in London recently.  (It was flying in the air to Qatar at the...