Personal Web Discovery (aka Webfinger)

There's a particular discovery problem for open and distributed protocols such as OpenID, OAuth, Portable Contacts, Activity Streams, and OpenSocial.  It seems like a trivial problem, but it's one of the stumbling blocks that slows mass adoption.  We need to fix it.  So first, I'm going to name it:

The Personal Web Discovery Problem:  Given a person, how do I find out what services that person uses?

This does sound trivial, doesn't it?  And it is easy as long as you're service-centric; if you're building on top of social network X, there is no discovery problem, or at least only a trivial one that can be solved with proprietary APIs.  But what if you want to build on top of X,Y, and Z?  Well, you write code to make the user log in to each one so you can call those proprietary APIs... which means the user has to tell you their identity (and probably password) on each one... and the user has already clicked the Back button because this is complicated and annoying.

This is also the cause of the "NASCAR Effect" that is plaguing OpenID UIs today -- you are faced with a Hobson's choice of making the user figure out what their OpenID is on their favorite provider, or figuring it out for them by making them click on a simple button... on an ever-growing array of buttons to cover all of your top identity providers and your business partners.  So the UI is more complicated than simple username/password.  This is not a recipe for success.

Next, there's the sharing problem -- if I want to share my calendar with someone, how does my software know what calendaring service my friend uses?  Again, if we're both on the same calendar service, then we're fine; otherwise we're in the situation that email was in decades ago, where you had to figure out the bang-path hop to hop address to reach your intended recipient.  (Note that in this case, the service being discovered is for a user who isn't even present.)

Finally, what is a person on the web?  At the moment we can represent a person as a URL (OpenID) or as an email address (most everybody).  A huge adoption issue for OpenID is the lack of a standard for using an email address as an OpenID.  The lack of such a standard is due to email address privacy concerns, and lack of discovery services for email addresses.  The horse has mostly left the barn on email address privacy already, as everyone uses email addresses for logins, and we just need to be careful about not publishing them publicly.  Discovery is now a solved problem, but the news isn't widely distributed yet.

Last week, over bacon and coffee at Social Web Foo Camp, Blaine, Breno, and I realized that all of the pieces are in place to solve these problems, and that they just need to be hooked up the right way, and threw together a last minute session Sunday morning to talk about it.  Here's my take-away:

Personal Web Discovery Puzzle Piece #1: URLs are people, and so are email addresses.

We allow email addresses anywhere an end user would use an OpenID -- from an end user's point of view, they can use an existing email address as an OpenID.  While we're at it, we allow any sufficiently well formed and discoverable string to function as an OpenID, for example Jabber IDs.  This means that a user can use any login ID as an OpenID, and also that if I know someone's email address from their business card, I share things like my calendar with them (without sending email).  Of course this requires discovery via email addresses to make OpenID work; fortunately that's the second puzzle piece.

Personal Web Discovery Puzzle Piece #2: The new discovery spec is here!

draft-hammer-discovery-03 is hot off the virtual presses this month; section 4.4, The Host Metadata Document, describes the basic piece needed for discovery, but in that spec it's difficult to see how this fits in with puzzle piece #1.  Here's how:  If I provide email addresses at example.com, while redirecting HTTP requests from example.com to www.example.com, I publish a text file at http://www.example.com/host-meta, which contains a line like this one:
Link-Pattern: <http://meta.example.org/?q={%uri}>; 
This means "take the thing you're asking about in URI form -- e.g., mailto:joe@example.com -- stick it in the query parameter to the meta.example.org service, and do a GET on that to retrieve a bunch of metadata about joe@example.com".  The metadata format XRD is itself a simplification of the existing metadata used by OpenID and OAuth today, and it's basically typed links based on URLs.  It maps joe@example.com to the appropriate OpenID provider to be used -- and that itself can be editable, so Joe can choose to use any provider he or she wishes.

So with a bit of swizzling, clients can map from joe@example.com to see if it's usable as an OpenID and if so, where to send the user to log in.  This eliminates the NASCAR effect.  It also means that clients such as web browsers can check to see if the user has a usable OpenID already (it probably has the users' email address from form fill already) and can present a very simple chrome-based "Log in as joe@example.com" on any web site that allows OpenID.  As a nice side effect, we also make the whole system much more phishing-resistant.

But authentication is just one service.  What if I want to provide a way for people to get my public activitity stream, for example?  That's almost trivial; just map joe@example.com to the default activity stream, and _that_ stream is a public Activity Stream feed.  I can also link to my blog and its feeds, my photo stream, my calendar, my address book, etc.  It's a user-centric web of services, tied together by a single identifier and discovery.

What about privacy?

All of the basic discovery use cases don't require any real authentication or security beyond that provided by HTTP(S).  The services pointed at can of course require authentication -- if I publish a calendar endpoint, that doesn't mean I let just anyone see it; or I may make my free/busy times public but my details may be ACL'd.  The process of discovering that a resource is ACL'd and how to go about authenticating so as to get access is just OAuth (or rather, a usage of the draft-hammer-discovery spec that uses types and endpoints specific to OAuth).  So it's discovery all the way down, and it's possible to mix in as much or as little privacy protection as is needed in each case.  The nice thing is that everybody is already standardizing on OAuth.

Sounds nice, but how does this metadata get created?  Out of thin air?

So we have standards ready to go, and could start writing client libraries today.  But where will all of this metadata come from?  What will motivate identity providers to publish this data, and how can we ensure that they allow users to configure it and not lock them in to the providers' own services?

There are several answers.  First, this spec provides more value to an email address -- so email providers have an incentive to provide it.  It's fairly trivial for them to do at least the basics; publish a static file off their main (or www) site, and provide a basic mapping service to point at whatever they have or know already that's public.  So the cost is low, and the potential benefits are high -- and once one email provider does this, it provides more incentive for the others to follow.

Second, some of the metadata is already present; every Yahoo! and Google user already has an OpenID service but none of them know it yet.  So there is value in just hooking up what's automatically provided.  However, this does lead to the danger of lock-in -- it's fine to default to your own service, but you shouldn't be limited to that service and you should also be able to override the defaults, ideally without needing to go and configure boring settings pages.  Profile pages are a valuable source of discovery data here if profile providers allow linking to services elsewhere.

Going Meta

There is another way to bootstrap.  Once you have a personal web discovery metadata service, and a way to edit per-user data, you can also create a personal web update service.  So then if you're at Flickr, and Flickr knows your email address, Flickr can find out, via discovery, if it can update your personal web data; and if so, offer to add itself as a photo stream service.  This would be done via OAuth of course, with your permission.  So services themselves could take care of the grungy work of adding links to your personal web.

Next Steps

Next steps are to get this documented properly, in the form of a HOWTO and running example code and some solid client libraries.  These are worth a million words of spec.

NB: You'll notice in general that there's no brilliant new idea here; this is just putting pieces that already exist together.  In fact, much of this is a re-invention of Liberty WSF discovery, but less SOAP-y and more deployable.