Personal Web Discovery (aka Webfinger)

There's a particular discovery problem for open and distributed protocols such as OpenID, OAuth, Portable Contacts, Activity Streams, and OpenSocial.  It seems like a trivial problem, but it's one of the stumbling blocks that slows mass adoption.  We need to fix it.  So first, I'm going to name it:

The Personal Web Discovery Problem:  Given a person, how do I find out what services that person uses?

This does sound trivial, doesn't it?  And it is easy as long as you're service-centric; if you're building on top of social network X, there is no discovery problem, or at least only a trivial one that can be solved with proprietary APIs.  But what if you want to build on top of X,Y, and Z?  Well, you write code to make the user log in to each one so you can call those proprietary APIs... which means the user has to tell you their identity (and probably password) on each one... and the user has already clicked the Back button because this is complicated and annoying.

This is also the cause of the "NASCAR Effect" that is plaguing OpenID UIs today -- you are faced with a Hobson's choice of making the user figure out what their OpenID is on their favorite provider, or figuring it out for them by making them click on a simple button... on an ever-growing array of buttons to cover all of your top identity providers and your business partners.  So the UI is more complicated than simple username/password.  This is not a recipe for success.

Next, there's the sharing problem -- if I want to share my calendar with someone, how does my software know what calendaring service my friend uses?  Again, if we're both on the same calendar service, then we're fine; otherwise we're in the situation that email was in decades ago, where you had to figure out the bang-path hop to hop address to reach your intended recipient.  (Note that in this case, the service being discovered is for a user who isn't even present.)

Finally, what is a person on the web?  At the moment we can represent a person as a URL (OpenID) or as an email address (most everybody).  A huge adoption issue for OpenID is the lack of a standard for using an email address as an OpenID.  The lack of such a standard is due to email address privacy concerns, and lack of discovery services for email addresses.  The horse has mostly left the barn on email address privacy already, as everyone uses email addresses for logins, and we just need to be careful about not publishing them publicly.  Discovery is now a solved problem, but the news isn't widely distributed yet.

Last week, over bacon and coffee at Social Web Foo Camp, Blaine, Breno, and I realized that all of the pieces are in place to solve these problems, and that they just need to be hooked up the right way, and threw together a last minute session Sunday morning to talk about it.  Here's my take-away:

Personal Web Discovery Puzzle Piece #1: URLs are people, and so are email addresses.

We allow email addresses anywhere an end user would use an OpenID -- from an end user's point of view, they can use an existing email address as an OpenID.  While we're at it, we allow any sufficiently well formed and discoverable string to function as an OpenID, for example Jabber IDs.  This means that a user can use any login ID as an OpenID, and also that if I know someone's email address from their business card, I share things like my calendar with them (without sending email).  Of course this requires discovery via email addresses to make OpenID work; fortunately that's the second puzzle piece.

Personal Web Discovery Puzzle Piece #2: The new discovery spec is here!

draft-hammer-discovery-03 is hot off the virtual presses this month; section 4.4, The Host Metadata Document, describes the basic piece needed for discovery, but in that spec it's difficult to see how this fits in with puzzle piece #1.  Here's how:  If I provide email addresses at example.com, while redirecting HTTP requests from example.com to www.example.com, I publish a text file at http://www.example.com/host-meta, which contains a line like this one:
Link-Pattern: <http://meta.example.org/?q={%uri}>; 
This means "take the thing you're asking about in URI form -- e.g., mailto:joe@example.com -- stick it in the query parameter to the meta.example.org service, and do a GET on that to retrieve a bunch of metadata about joe@example.com".  The metadata format XRD is itself a simplification of the existing metadata used by OpenID and OAuth today, and it's basically typed links based on URLs.  It maps joe@example.com to the appropriate OpenID provider to be used -- and that itself can be editable, so Joe can choose to use any provider he or she wishes.

So with a bit of swizzling, clients can map from joe@example.com to see if it's usable as an OpenID and if so, where to send the user to log in.  This eliminates the NASCAR effect.  It also means that clients such as web browsers can check to see if the user has a usable OpenID already (it probably has the users' email address from form fill already) and can present a very simple chrome-based "Log in as joe@example.com" on any web site that allows OpenID.  As a nice side effect, we also make the whole system much more phishing-resistant.

But authentication is just one service.  What if I want to provide a way for people to get my public activitity stream, for example?  That's almost trivial; just map joe@example.com to the default activity stream, and _that_ stream is a public Activity Stream feed.  I can also link to my blog and its feeds, my photo stream, my calendar, my address book, etc.  It's a user-centric web of services, tied together by a single identifier and discovery.

What about privacy?

All of the basic discovery use cases don't require any real authentication or security beyond that provided by HTTP(S).  The services pointed at can of course require authentication -- if I publish a calendar endpoint, that doesn't mean I let just anyone see it; or I may make my free/busy times public but my details may be ACL'd.  The process of discovering that a resource is ACL'd and how to go about authenticating so as to get access is just OAuth (or rather, a usage of the draft-hammer-discovery spec that uses types and endpoints specific to OAuth).  So it's discovery all the way down, and it's possible to mix in as much or as little privacy protection as is needed in each case.  The nice thing is that everybody is already standardizing on OAuth.

Sounds nice, but how does this metadata get created?  Out of thin air?

So we have standards ready to go, and could start writing client libraries today.  But where will all of this metadata come from?  What will motivate identity providers to publish this data, and how can we ensure that they allow users to configure it and not lock them in to the providers' own services?

There are several answers.  First, this spec provides more value to an email address -- so email providers have an incentive to provide it.  It's fairly trivial for them to do at least the basics; publish a static file off their main (or www) site, and provide a basic mapping service to point at whatever they have or know already that's public.  So the cost is low, and the potential benefits are high -- and once one email provider does this, it provides more incentive for the others to follow.

Second, some of the metadata is already present; every Yahoo! and Google user already has an OpenID service but none of them know it yet.  So there is value in just hooking up what's automatically provided.  However, this does lead to the danger of lock-in -- it's fine to default to your own service, but you shouldn't be limited to that service and you should also be able to override the defaults, ideally without needing to go and configure boring settings pages.  Profile pages are a valuable source of discovery data here if profile providers allow linking to services elsewhere.

Going Meta

There is another way to bootstrap.  Once you have a personal web discovery metadata service, and a way to edit per-user data, you can also create a personal web update service.  So then if you're at Flickr, and Flickr knows your email address, Flickr can find out, via discovery, if it can update your personal web data; and if so, offer to add itself as a photo stream service.  This would be done via OAuth of course, with your permission.  So services themselves could take care of the grungy work of adding links to your personal web.

Next Steps

Next steps are to get this documented properly, in the form of a HOWTO and running example code and some solid client libraries.  These are worth a million words of spec.

NB: You'll notice in general that there's no brilliant new idea here; this is just putting pieces that already exist together.  In fact, much of this is a re-invention of Liberty WSF discovery, but less SOAP-y and more deployable.


  1. "NASCAR effect" is a nice analogy for this.

  2. Interesting, maybe the way to go. Couple of issues.
    1) Onus is on the email provider to provide discovery. The larger providers will adopt. But it will be a huge task to get the thousands of small providers to adopt this.
    2) Also there is a trust issue when it comes to the smaller "fly by night" providers.

    The above two problems can be solved if the RP can query any email address at one of the major email providers :)

    There is one more easier way to handle this if the email provider is also the Openid provider. Just return the users meta data along with the authentication response! Actually this option can be implemented immediately. All the provider needs to do is allow the user to create his meta data links to all his services. And OAuth can take over from there on.

  3. Isn't this covered by eaut.org?

  4. @aswath: The email address mapping is a refinement of EAUT; there are some deployment issues with EAUT that the newer specs try to avoid.

  5. @santosh - If a small provider doesn't play, an RP can still use plain old email to do verification and invitation; it's just a worse experience for that small # of users. Also, you have exactly the same trust issue with account recovery for people using small providers today; nothing new there.

    I'm not going to touch the idea of treating large email providers as a federated OP without user permission with a 100 foot pole.

  6. Great post! When it comes to ACL's and sharing/collaboration I believe we need one other piece. That is some form of an identity token. For example, a friend that I allow on my ACL for my family photo albums, shouldn't have to create an identity federation with my photo service just so the photo service can check the ACL. The same would hold true for creating an entry in my calendar.


  7. I mean't treating large email providers as a federated OP with user permission. ie. the user delegates this function to the larger provider. Also would it be possible for smaller providers to delegate discovery and authentication to a larger provider under this scheme of things?

  8. @santosh absolutely, smaller providers could delegate profile discovery to other providers. You could also have an approach where two-level discovery is performed; first, to a profile-discovery page that the user could configure so that they could host their profile anywhere.

  9. John - out of curiosity, what sort of deployment issues does EAUT have? I've only ever read up on it, but I was thinking of having a play with it at some point.

  10. I am appalled at how many indirections WebFinger requires for what ought've been a single DNS query.

    Instead of bringing us layers upon layers of indirection, shouldn't you be providing us with tools to fix DNS so it provides a usable naming service for the 21st century?

    The DNS: too ghastly for anyone to dare touch?

  11. Your link to Portable Contacts in the first sentence actually leads to OpenID.

  12. still me

    BTW this unification (smae user@provider for ALL protocols) would also solve OpenID problem "okay, i signed my comment with URL, now how can i get notifications about replies (track-backs of a kind)?"

    I hate "To do / Monitor forum replies" folder in my bookmarks ;-)


Suspended by the Baby Boss at Twitter

Well!  I'm now suspended from Twitter for stating that Elon's jet was in London recently.  (It was flying in the air to Qatar at the...