2008/11/24

One site-meta to rule them all

Eran & Mark's site-meta proposal is... interesting. I have a gut feel that it's a bit like democracy: The worst method for whole-site discovery, except for all the others. The killer app for this, IMHO, is the ability to solve things like "lookup metadata for user x on domain y" in a general way. An immediate practical application is translating things like email addresses and Jabber IDs, all of the form (user@domain), to something you can perform discovery on.

Other than the embarrassment of hacking another hard-coded magic name alongside favicon.ico and robots.txt, I really only have one issue with the proposal: It requires a directory lookup via an XML document. I have nothing against XML, but it seems like overkill for this purpose.

An alternative would be to use a very, very simple text-based format that is NOT very extensible. Fortunately, there's already a proposal for this type of format from Mark, for a Link header:

Link: http://example.org/ch rel="previous";
title="previous chapter"
Just for simplicity, we can take the same format and embed it in an application/site-meta document. The sample site-meta XML would then transform into something like this (with newlines delimiting each type of metadata):
/robots.txt rel="robots"
/p3p.xml rel="privacy"
http://other.example.net/example rel="http://example.com/rel"
We lose the ability to use namespaces and inline (embedded) metadata in this site-meta document. Or alternatively, we gain the ability to ignore namespaces and we don't need to download inline metadata we don't care about.

One closing thought: A persistent issue with XML, unfortunately, is problems with cryptographic signatures, primarily due to complexity of signing and verifying something whose canonical representation is really an Infoset rather than a text stream. This hypothetical format, or any of a number close to it in configuration space, could solve that problem easily:
/robots.txt rel="robots"
/p3p.xml rel="privacy"
http://other.example.net/example rel="http://example.com/rel"
data:
application/x-pkcs7-signature;base64,iVBORA...rkJggg== rel="signature"
...with some simple rules to determine which octets get signed.

3 comments:

  1. +1 especially since if sites are going to have a single /site-meta file for all their discovery, then as soon as one person fat-fingers invalid syntax for an entry, it will break discovery for the entire site. XML is very brittle. But with one-line-per-entry syntax, presumably the damage would be limited and more easy to ignore.

    ReplyDelete
  2. Hey John,

    I like the democracy comparison -- well said.

    WRT XML - you're not the first person to raise this, and I agree that it adds some risk.

    The intent, however, is to address as many of these use cases as possible, and I often run across people for who the extra roundtrip is unacceptable (e.g., in P3P, this concern drove the whole compact policy design, which IMO is pretty broken). Since site-meta is aimed at addressing as many of these uses as possible, that's why XML was chosen.

    However, if there's emerging consensus that a line-oriented format would be better, I'm all for it; the whole point here is to build momentum behind one approach, and make it successful. So, get people to make some noise!

    P.S. I'm re-submitting this comment, because the first time the openid dialog ate the comment text. Grr...

    ReplyDelete
  3. This comment has been removed by a blog administrator.

    ReplyDelete

COVID-19: Evaluating School Closures

I'm getting increasingly concerned that many Santa Clara County public schools are continuing normal operations when -- based on availab...