Skip to main content

We Built this Firehose on Open Specs

photo CC joevans
Last week, we launched the Buzz Firehose feed.  You can see it in action at the sample Buzz Mood app (code).  One thing which we didn't highlight too much (though DeWitt did) is that this is completely built on non-proprietary protocols and formats: Atom, PubSubHubbub, and Activity Streams are able to deliver real time push updates easily and quickly.

We did do some special things on the supply side for efficiency.  If you GET the firehose feed you'll see that it never contains any updates, just feed metadata and the hub link.  Conceptually, it contains the last 0 seconds of updates.  This is primarily because it doesn't make sense to poll this feed, so it exists only to provide a subscription mechanism.  Internally, for efficiency we're pushing new updates to the hub with full data rather than pinging the hub and waiting for it to poll the feed.  But, we're using the standard Atom format to push the data. The hub is doing coalescing of updates so when a bunch come in over a small period of time, as with the firehose, it will send them in batch every few hundred msecs to each subscriber.  So the POSTs you get for the firehose will almost always contain a feed with multiple entries in it -- whatever has come in since the last time you've received an update.  There are also some tweaks to the hub retry configuration; if your server doesn't keep up with the firehose the hub will back off fairly quickly and won't attempt many retries.

Popular posts from this blog

Personal Web Discovery (aka Webfinger)

There's a particular discovery problem for open and distributed protocols such as OpenID, OAuth, Portable Contacts, Activity Streams, and OpenSocial.  It seems like a trivial problem, but it's one of the stumbling blocks that slows mass adoption.  We need to fix it.  So first, I'm going to name it:

The Personal Web Discovery Problem:  Given a person, how do I find out what services that person uses?
This does sound trivial, doesn't it?  And it is easy as long as you're service-centric; if you're building on top of social network X, there is no discovery problem, or at least only a trivial one that can be solved with proprietary APIs.  But what if you want to build on top of X,Y, and Z?  Well, you write code to make the user log in to each one so you can call those proprietary APIs... which means the user has to tell you their identity (and probably password) on each one... and the user has already clicked the Back button because this is complicated and annoying.

The problem with creation date metadata in PDF documents

Last night Rachel Maddow talked about an apparently fake NSA document "leaked" to her organization.  There's a lot of info there, I suggest you listen to the whole thing:

http://www.msnbc.com/rachel-maddow/watch/maddow-to-news-orgs-heads-up-for-hoaxes-985491523709

There's a lot to unpack there but it looks like somebody tried to fool MSNBC into running with a fake accusation based on faked NSA documents, apparently based on cloning the document the Intercept published back on 6/5/2017, which to all appearances was itself a real NSA document in PDF form.

I think the main thrust of this story is chilling and really important to get straight -- some person or persons unknown is sending forged PDFs to news organization(s), apparently trying to get them to run stories based on forged documents.  And I completely agree with Maddow that she was right to send up a "signal flare" to all the news organizations to look out for forgeries.  Really, really, really import…
Twister is interesting.  It's a decentralized "microblogging" system based on putting together existing protocols:  Bitcoin, distributed hash tables, and Bittorrent.  The most interesting part for me is using Bitcoin for user registration and spam control.  Federated systems handle this with federated trust, which is at least conceptually simple.  The Twister/Bitcoin mechanism looks intriguing though I don't know enough about Bitcoin to really comment.  Need to read further.