Skip to main content

One site-meta to rule them all

Eran & Mark's site-meta proposal is... interesting. I have a gut feel that it's a bit like democracy: The worst method for whole-site discovery, except for all the others. The killer app for this, IMHO, is the ability to solve things like "lookup metadata for user x on domain y" in a general way. An immediate practical application is translating things like email addresses and Jabber IDs, all of the form (user@domain), to something you can perform discovery on.

Other than the embarrassment of hacking another hard-coded magic name alongside favicon.ico and robots.txt, I really only have one issue with the proposal: It requires a directory lookup via an XML document. I have nothing against XML, but it seems like overkill for this purpose.

An alternative would be to use a very, very simple text-based format that is NOT very extensible. Fortunately, there's already a proposal for this type of format from Mark, for a Link header:

Link: http://example.org/ch rel="previous";
title="previous chapter"
Just for simplicity, we can take the same format and embed it in an application/site-meta document. The sample site-meta XML would then transform into something like this (with newlines delimiting each type of metadata):
/robots.txt rel="robots"
/p3p.xml rel="privacy"
http://other.example.net/example rel="http://example.com/rel"
We lose the ability to use namespaces and inline (embedded) metadata in this site-meta document. Or alternatively, we gain the ability to ignore namespaces and we don't need to download inline metadata we don't care about.

One closing thought: A persistent issue with XML, unfortunately, is problems with cryptographic signatures, primarily due to complexity of signing and verifying something whose canonical representation is really an Infoset rather than a text stream. This hypothetical format, or any of a number close to it in configuration space, could solve that problem easily:
/robots.txt rel="robots"
/p3p.xml rel="privacy"
http://other.example.net/example rel="http://example.com/rel"
data:
application/x-pkcs7-signature;base64,iVBORA...rkJggg== rel="signature"
...with some simple rules to determine which octets get signed.

Comments

  1. +1 especially since if sites are going to have a single /site-meta file for all their discovery, then as soon as one person fat-fingers invalid syntax for an entry, it will break discovery for the entire site. XML is very brittle. But with one-line-per-entry syntax, presumably the damage would be limited and more easy to ignore.

    ReplyDelete
  2. Hey John,

    I like the democracy comparison -- well said.

    WRT XML - you're not the first person to raise this, and I agree that it adds some risk.

    The intent, however, is to address as many of these use cases as possible, and I often run across people for who the extra roundtrip is unacceptable (e.g., in P3P, this concern drove the whole compact policy design, which IMO is pretty broken). Since site-meta is aimed at addressing as many of these uses as possible, that's why XML was chosen.

    However, if there's emerging consensus that a line-oriented format would be better, I'm all for it; the whole point here is to build momentum behind one approach, and make it successful. So, get people to make some noise!

    P.S. I'm re-submitting this comment, because the first time the openid dialog ate the comment text. Grr...

    ReplyDelete
  3. This comment has been removed by a blog administrator.

    ReplyDelete

Post a Comment

Popular posts from this blog

The problem with creation date metadata in PDF documents

Last night Rachel Maddow talked about an apparently fake NSA document "leaked" to her organization.  There's a lot of info there, I suggest you listen to the whole thing:

http://www.msnbc.com/rachel-maddow/watch/maddow-to-news-orgs-heads-up-for-hoaxes-985491523709

There's a lot to unpack there but it looks like somebody tried to fool MSNBC into running with a fake accusation based on faked NSA documents, apparently based on cloning the document the Intercept published back on 6/5/2017, which to all appearances was itself a real NSA document in PDF form.

I think the main thrust of this story is chilling and really important to get straight -- some person or persons unknown is sending forged PDFs to news organization(s), apparently trying to get them to run stories based on forged documents.  And I completely agree with Maddow that she was right to send up a "signal flare" to all the news organizations to look out for forgeries.  Really, really, really import…

Why I'm No Longer On The Facebook

I've had a Facebook account for a few years, largely because other people were on it and were organizing useful communities there.  I stuck with it (not using it for private information) even while I grew increasingly concerned about Facebook's inability to be trustworthy guardians of private information.  The recent slap on the wrist from the FTC for Facebook violating the terms of its prior consent agreement made it clear that there wasn't going to be any penalty for Facebook for continuing to violate court orders.
Mark Zuckerberg claimed he had made a mistake in 2016 by ridiculing the idea of election interference on his platform, apologized, and claimed he was turning over a new leaf:
“After the election, I made a comment that I thought the idea misinformation on Facebook changed the outcome of the election was a crazy idea. Calling that crazy was dismissive and I regret it.  This is too important an issue to be dismissive.” It turns out, though, that was just Zuck ly…

Personal Web Discovery (aka Webfinger)

There's a particular discovery problem for open and distributed protocols such as OpenID, OAuth, Portable Contacts, Activity Streams, and OpenSocial.  It seems like a trivial problem, but it's one of the stumbling blocks that slows mass adoption.  We need to fix it.  So first, I'm going to name it:

The Personal Web Discovery Problem:  Given a person, how do I find out what services that person uses?
This does sound trivial, doesn't it?  And it is easy as long as you're service-centric; if you're building on top of social network X, there is no discovery problem, or at least only a trivial one that can be solved with proprietary APIs.  But what if you want to build on top of X,Y, and Z?  Well, you write code to make the user log in to each one so you can call those proprietary APIs... which means the user has to tell you their identity (and probably password) on each one... and the user has already clicked the Back button because this is complicated and annoying.