2007/08/16

Do you trust your friends with your URLs?

"Facebook's data feed a data leak?" over at Lawgarithms:

Please correct me if I’m wrong about this; I want to be wrong aboutthis. Or I want to learn that Facebook has already considered and dealtwith the issue and it’s just not readily apparent to me. But I’mthinking that Facebook’s feeds for Status Updates, Notes, and PostedItems must in many instances be at odds with privacy settings thatattempt to limit users’ Facebook activities to “friends only” (or areeven more restrictive).

Denise is both right and wrong.  The basic issue is that once you give out a feed URL (which is not guessable) to a friend, they can then give it out to their friends and their friends... ad infinitum.  These people can then get your ongoing updates, without you explicitly adding them.

Of course, this requires your friends to breach the trust you placed in them to guard your bits.  Notice that even without feeds, your friends can easily copy and paste your bits and send them on manually.  It's a simple matter to automate this if a friend really wants to broadcast your private data to whoever they want.  So as soon as you open up your data, you are vulnerable to this.  To prevent it you'd need working DRM; not a good path to go down.

It would be possible to access control the feeds; there's even a nascent standard (OAuth) for doing this in a secure and standards compliant way.  But even this doesn't prevent your friends from copying your bits.

A much simpler approach is to hand out a different URL for each friend.  They're still obfuscated of course.  You can then block a friend (and anyone they've shared the URL with) from seeing future updates at any time.  This is about the best that can be done.  Update:  This is apparently exactly what Facebook has done.  Denise is still concerned that friends could accidentally or purposefully re-share the data, since the feed format makes it easy to do so.

Facebook's messaging could definitely be improved.  Suggestions?

2007/08/07

RESTful partial updates: PATCH+Ranges

Over the past couple of months, there's been a lot of discussion aboutthe problem of partial updates in REST-over-HTTP[1][2][3][4][5].  The problemis harder than it appears at first glance.  The canonical scenario isthat you've justretrieved a complicated resource, like an address book entry, and youdecide you want to update just one small part, like a phone number. The canonical way to do this is to update yourrepresentation of the resource and then PUT the whole thing back,including all of the parts you didn't change.  If you want to avoid thelost update problem,you send back the ETag you got from the GETwith your PUT inside an If-Match: header, so that you know that you'renot overwriting somebody else's change.

This works, but it doesn't scale well to large resources or highupdate rates, where "large" and "high" are relative to your budget forbandwidth and tolerance for latency.  It also means that you can'tsimply and safely say "change field X, overwriting whatever is there,but leave everything else as-is".

I've seen the same thought process recapitulated a few times now on howtosolve this problem in a RESTful way.  The first thing that springs tomind is to ask if PUT can be used to send just the part you want tochange.  This can be made to work but has some major problemsthat make it a poor general choice. 
  • A PUT to a resourcegenerally means "replace", not "update", so it's semanticallysurprising.
  • In theory it could break write-through caches.  (This is probablyequivalent to endangering unicorns.)
  • It doesn'twork for deleting optional fields or updating flexible lists such asAtomcategories.
The next idea is generally to simply use POST to update the resource. This does work in many cases, but conflicts with the use of POST to adda resource to a collection.  That is, if you POST to a collection, areyou trying to add an element to the collection, or perform some otherupdate to the collection's metadata?  It's possible disambiguate usingMIMEtypes but it feels fragile.  It also doesn't capture the fact that theoperation is retryable; POST in general is not retryable.

A good solution to the partial update problem would be efficient,address the canonical scenarioabove, be applicable to a wide range of cases, not conflict with HTTP,extend basic HTTP as little as possible, deal with optimisticconcurrency control, and deal with the lost update problem. The methodshould be discoverable (clients should be able to tell if a serversupports the method before trying it). It would also be nice if thesolution would let us treat data symmetrically, both getting andputting sub-parts of resources as needed and using the same syntax.

There are three contenders for a general solution pattern:

Expose Parts as Resources.  PUT to a sub-resourcerepresents aresources' sub-elements with their own URIs.   This is in spirit whatWeb3Sdoes.  However, it pushes the complexity elsewhere:  Intodiscovering the URIs of sub-elements, and into how ETags work acrosstwo resources that are internally related.  Web3S appears to handleonlyhierarchical sub-resources, not slicing or arbitrary selections.

Accept Ranges on PUTs.  Ranged PUT leverages andextends theexisting HTTP Content-Range:header to allow a client tospecify a sub-part of a resource, not necessarily just byte ranges buteven things like XPath expressions. Ranges are well understood in thecase of GET but were rejected as problematic for PUT a while back bytheHTTP working group.  The biggest concern was that it adds a problematicmust-understand requirement.  If a server or intermediary accepts a PUTbut doesn'tunderstand that it's just for a sub-range of the target resource, itcould destroy data.   But, thisdoes allow for symmetry in reading andwriting.  As an aside, the HTTP spec appears to contradict itselfabout whether range headers are extensible or are restricted to justbyte ranges.  This method works fine with ETags; additional methods fordiscovery need to be specified but could be done easily.

Use PATCHPATCH is a method that's beentalked about for awhilebut is the subject of some controversy. James Snell has revived LisaDusseault's draft PATCH RFC[6] and updated it, and he's looking forcomments on the new version.  I think this is a pretty good approachwith a few caveats.  The PATCH method may not be supported byintermediaries, but if it fails it does fail safely.  It requires a newverb, which is slightly painful.  It allows for variety of patchingmethods via MIME types.  It's unfortunately asymmetric in that it doesnot address the retrieval ofsub-resources.  It works fine with ETags.  It's discoverable via HTTPheaders (OPTIONS and Allow: PATCH).

The biggest issue with PATCH is the new verb.  It's possible thatintermediaries may fail to support it, or actively block it.  This isnot too bad, since PATCH is just an optimization -- if you can't useit, you can fall back to PUT.  Or use https, which effectively tunnelsthrough most intermediaries.

On balance, I like PATCH.  The controversy over the alternatives seemto justify the new verb.  It solves the problem and I'd be happy withit.  I would like there to be a couple of default delta formats definedwith the RFC. 

The only thing missing is symmetricalretrieval/update.  But, there's an interesting coda:  PATCH is definedso that Content-Range is must-understand on PATCH[6]:
The server MUST NOT ignore any Content-* (e.g.  Content-Range) 
headers that it does not understand or implement and MUST return
a 501 (Not Implemented) response in such cases.
So let's say aserver wanted to be symmetric; it could advertise support forXPath-based ranges on bothGET and PATCH. A client would use PATCH with a range to send backexactly the same data structure it retrievedearlier with GET.  An example:
GET /abook.xml
Range: xpath=/contacts/contact[name="Joe"]/work_phone
which retrieves the XML:
<contacts><contact><work_phone>650-555-1212</work_phone>
</contact></contacts>

Updating the phone number is very symmetrical with PATCH+Ranges:
PATCH /abook.xml
Content-Range: xpath=/contacts/contact[name="Joe"]/work_phone
<contacts><contact><work_phone>408-555-1591</work_phone>
</contact></contacts>
The nice thing about this is that no new MIME types need to beinvented; the Content-Range header alerts the server that the stuffyou're sending is just a fragment; intermediaries will eitherunderstand this or fail cleanly; and the retrievalsand updates are symmetrical. 

[1]http://www.snellspace.com/wp/?p=683
[2]http://www.25hoursaday.com/weblog/2007/06/09/WhyGDataAPPFailsAsAGeneralPurposeEditingProtocolForTheWeb.aspx
[3]http://www.dehora.net/journal/2007/06/app_on_the_web_has_failed_miserably_utterly_and_completely.html
[4]http://tech.groups.yahoo.com/group/rest-discuss/message/8412
[5]http://tech.groups.yahoo.com/group/rest-discuss/message/9118
[6]http://www.ietf.org/internet-drafts/draft-dusseault-http-patch-08.txt

Some thoughts on "Some Thoughts on Open Social Networks"

Dare Obasanjo:
"Content Hosted on the Site Not Viewable By the General Public and not Indexed by Search Engines:  As a user of Facebook, I consider this a feature not a bug."

Dare goes on to make some great points about situations where he's needed to put some access controls in place for some content.  I could equally make some points about situations where exposing certain content as globally as possible has opened up new opportunities and been a very positive thing for me.  After which, I think we'd both agree that it's important to be able to put users in control.
Dare: "Inability to Export My Content from the Social Network: This is something that geeks complain about ... danah boyd has pointed out in her research that many young users of social networking sites consider their profiles to be ephemeral ... For working professionals, things are a little different since they mayhave created content that has value outside the service (e.g.work-related blog postings related to their field of endeavor) soallowing data export in that context actually does serve a legitimateuser need."

It isn't just a data export problem, it's a reputation preservation problem too.  Basically, as soon as you want to keep your reputation (identity), you want to be able to keep your history.  It's not a problem for most younger users since they're experimenting with identities anyway.  Funny thing, though:  Younger users tend to get older.  At some point in the not so distant future that legitimate user need is going to be a majority user need.
Dare: "It is clear that a well-thought out API strategy that drives people toyour site while not restricting your users combined with a great userexperience on your website is a winning combination. Unfortunately,it's easier said than done."

+1.  Total agreement.
Dare: "Being able to Interact with People from Different Social Networks from Your Preferred Social Network: I'm on Facebook and my fiancĂ©e is on MySpace. Wouldn't it be great if we could friend each other and send private messages without both being on the same service?  It is likely that there is a lot of unvoiced demand for thisfunctionality but it likely won't happen anytime soon for businessreasons..."

Will there be a viable business model in meeting the demand that Dare identifies, one which is strong enough to disrupt business models dependent on a walled garden?  IM is certainly a cautionary tale, but there are some key differences between IM silos and social networking sites.  One is that social networking sites are of the Web in a way that IM is not -- specifically they thrive in a cross-dependent ecosystem of widgets, apps, snippets, feeds, and links.  It's possible that "cooptition" will be more prevalent than pure competition.  And it's quite possible for a social network to do violently antisocial things and drive people away as Friendster did, or simply have a hot competitor steal people away as Facebook is doing.  Facebook's very success argues against the idea that there will be a stable detente among competing social network systems.

Relationship requires identity

NishantKaushik:
Let's face it, relationship silos are really justextensions of identity silos.  The problem of having to create andre-create my relationships as I go from site to site mirrors my problemof having to create and re-create my identity as I go from site tosite. The Facebook Platform might have one of the better IdentityProvider APIs , but all the applications built on it still have to staywithin Facebook itself.
Yup.  Which is the primary reason that I've been interested in identity-- it's a fundamental building block for social interactions of allkinds.  And think of what could happen if you could use theInternet as your social network as easily as you can use Facebooktoday.  As ScottGilbertson at Wired discovered, it's nothard to replicate most of the functionality; it's the people whoare "on" Facebook which makes it compelling.

2007/08/02

cat Google Spreadsheets | Venus > my.feed

Sam Ruby (prompted by Alf Eaton) combines Google Spreadsheets and Venus to let people manage Venus subscription lists (or whatever) using Spreadsheets.  The lingua franca is of course CSV-over-HTTP.  Like Unix pipes running over Internet, um, pipes.

Note that this requires the data to be publicly readable on the Spreadsheets side, which is fine for this use.  A lot more uses would be enabled with a lingua franca for deputizing services to talk securely to each other.