Skip to main content

I'd Rather Batch than Fight

We need HTTP batching. Or at least some people do, in some situations. I think that a small optimization here can avoid re-inventing a lot of wheels.

To back up a bit, the problem I want to solve is just the one of minimizing round trips for REST based services such as AtomPub. Not atomic transactions, nor additional semantics; just avoiding the latency incurred by multiple round trips. The motivation is basically the same as for HTTP Pipelining.

The ideal solution would work for all types of requests (not just Atom, or JSON, or ...). Batching is orthogonal to all of these and with the use of mashups it's quite possible for a client to be using any or all of them. Batching should work really well within browsers but also be usable with any HTTP libraries. And, it should be auto-discoverable, so a client can fall back to individual requests if a server doesn't support batching.

My canonical motivating example is a smart client that is doing an update and then a retrieval of two different but interdependent resources. The client knows that the update is almost certain to trigger a need for a refresh of the second resource, and you need to ensure that the operations are performed in that order due to the dependency. It turns out to be very difficult to model this as a single REST operation, and it's really two logical operations anyway.

Proposal: Take James' multipart-MIME batch proposal, but use POST instead of PATCH. (I agree with Bill de hÓra that PATCH is a stretch; if we had a BATCH method that'd be fine, but we don't right now.) Make the semantics exactly as if the requests had been sent, in order, from the same client on the same keep-alive connection. Do not provide any atomicity guarantees. Provide an easy way for a client to determine if a server supports batching. If a server does not, then a client has a trivial for loop fallback which won't change the semantics of the request at all.

Q: Why drag in MIME? Why not just use XML/JSON/...?
A: Because you don't want to mandate a particular parser, or even a text based format. Image uploads, for example, should work fine with this scheme. It's generic. Note that we are in fact tunnelling HTTP requests here. I'd much rather be up front and open about this than try to hide the fact by smuggling it inside some other syntax.

Q: But MIME is ugly and not supported by my language!
A: We're not trying to interoperate with mail or Usenet here, just slice up a body with a separator that is a sequence of bytes (--batch-34343434) and apply some simple mapping rules for things like method and URL path.

Q: Do we really need this?
A: If you think you do, you can implement it. If you don't, you can skip it. You'll interoperate either way. Let the market decide what's best.

Popular posts from this blog

Personal Web Discovery (aka Webfinger)

There's a particular discovery problem for open and distributed protocols such as OpenID, OAuth, Portable Contacts, Activity Streams, and OpenSocial.  It seems like a trivial problem, but it's one of the stumbling blocks that slows mass adoption.  We need to fix it.  So first, I'm going to name it:

The Personal Web Discovery Problem:  Given a person, how do I find out what services that person uses?
This does sound trivial, doesn't it?  And it is easy as long as you're service-centric; if you're building on top of social network X, there is no discovery problem, or at least only a trivial one that can be solved with proprietary APIs.  But what if you want to build on top of X,Y, and Z?  Well, you write code to make the user log in to each one so you can call those proprietary APIs... which means the user has to tell you their identity (and probably password) on each one... and the user has already clicked the Back button because this is complicated and annoying.

The problem with creation date metadata in PDF documents

Last night Rachel Maddow talked about an apparently fake NSA document "leaked" to her organization.  There's a lot of info there, I suggest you listen to the whole thing:

http://www.msnbc.com/rachel-maddow/watch/maddow-to-news-orgs-heads-up-for-hoaxes-985491523709

There's a lot to unpack there but it looks like somebody tried to fool MSNBC into running with a fake accusation based on faked NSA documents, apparently based on cloning the document the Intercept published back on 6/5/2017, which to all appearances was itself a real NSA document in PDF form.

I think the main thrust of this story is chilling and really important to get straight -- some person or persons unknown is sending forged PDFs to news organization(s), apparently trying to get them to run stories based on forged documents.  And I completely agree with Maddow that she was right to send up a "signal flare" to all the news organizations to look out for forgeries.  Really, really, really import…
Twister is interesting.  It's a decentralized "microblogging" system based on putting together existing protocols:  Bitcoin, distributed hash tables, and Bittorrent.  The most interesting part for me is using Bitcoin for user registration and spam control.  Federated systems handle this with federated trust, which is at least conceptually simple.  The Twister/Bitcoin mechanism looks intriguing though I don't know enough about Bitcoin to really comment.  Need to read further.