Abstractioneer by John Panzer: 2008/11

2008/11/24

One site-meta to rule them all

Eran & Mark's site-meta proposal is... interesting. I have a gut feel that it's a bit like democracy: The worst method for whole-site discovery, except for all the others. The killer app for this, IMHO, is the ability to solve things like "lookup metadata for user x on domain y" in a general way. An immediate practical application is translating things like email addresses and Jabber IDs, all of the form (user@domain), to something you can perform discovery on.

Other than the embarrassment of hacking another hard-coded magic name alongside favicon.ico and robots.txt, I really only have one issue with the proposal: It requires a directory lookup via an XML document. I have nothing against XML, but it seems like overkill for this purpose.

An alternative would be to use a very, very simple text-based format that is NOT very extensible. Fortunately, there's already a proposal for this type of format from Mark, for a Link header:

Link: http://example.org/ch rel="previous";
       title="previous chapter"

Just for simplicity, we can take the same format and embed it in an application/site-meta document. The sample site-meta XML would then transform into something like this (with newlines delimiting each type of metadata):

/robots.txt rel="robots"
/p3p.xml rel="privacy"
http://other.example.net/example rel="http://example.com/rel"

We lose the ability to use namespaces and inline (embedded) metadata in this site-meta document. Or alternatively, we gain the ability to ignore namespaces and we don't need to download inline metadata we don't care about.

One closing thought: A persistent issue with XML, unfortunately, is problems with cryptographic signatures, primarily due to complexity of signing and verifying something whose canonical representation is really an Infoset rather than a text stream. This hypothetical format, or any of a number close to it in configuration space, could solve that problem easily:

/robots.txt rel="robots"
/p3p.xml rel="privacy"
http://other.example.net/example rel="http://example.com/rel"
data:application/x-pkcs7-signature;base64,iVBORA...rkJggg== rel="signature"

...with some simple rules to determine which octets get signed.

2008/11/11

Discovery Metadata is Just Data

Another comment on Eran's discovery mechanisms list (which is itself great). I'm gradually reaching the conclusion that he's right that content negotiation isn't the best idea, but for the wrong reasons. The question is, are the alternatives any better?

Quoth Eran:

HTTP Content Negotiation - using the 'Accept' request header, the consumer informs the server it is interested in the metadata and not the resource itself, to which the server responds with the metadata document or its location. In XRDS, the consumer sends an HTTP GET (or HEAD) request to the resource URL with an 'Accept' header and content-type 'application/xrds+xml'. This informs the server of the consumer's discovery interest, which in turn may reply with the discovery document itself, redirect to it, or return its location via the 'X-XRDS-Location' (or 'Link') response header.

[-] Resource Declaration - does not address as it focuses on the consumer declaring its intentions.
[+] Direct Metadata Access - provides a simple method for directly requesting the metadata document.
[-] Web Compliant - while some argue that the metadata can be considered another representation of the resource, it is very much external to it. Using the 'Accept' header to request a separate resource (as opposed to a different representation of the same resource) violates the HTTP protocol. It also prevents using the discovery content-type as a valid (self-standing) web resource having its own metadata.
[-] Scale Agnostic - requires access to HTTP request and response headers, as well as the registration of multiple handlers for the same resource URL based on the 'Accept' header. In addition, improper use or implementation of the 'Vary' header in conjunction with the 'Accept' header will cause proxies to serve the metadata document instead of the resource itself - a great concern to large providers with frequently visited front-pages.
[-] Extendable - limited to a single content-type for metadata, and does not allow any existing schemas (with well known content-type).

Minimum roundtrips to retrieve metadata: 1

All of the points above are addressable with minor tweaks, turning this into a usable BigCo-scale solution. Specifically, I'd argue that it's perfectly web compliant to regard a resource's 'metadata' as a variant representation of the resource itself. As an example, consider an image resource that can be requested in several variants: image/gif, image/jpeg, or application/image-meta+xml. The last format gives you the EXIF metadata about the image but in a more convenient XML format. Format A gives you image bits; format B gives you images bits and metadata; formact C gives you just the metadata. But what's data and what's metadata just depends on your point of view.

The other argument against this is that buggy proxy caches may cache the wrong representation of, say, http://yahoo.com. This is something of a strawman in that this would need to be a cache between an RP and an OP. In any case, returning an uncacheable redirect (303?) to a metadata resource would avoid problems in practice.

All of this said, configuring something like http://yahoo.com/ (or, ahem, http://google.com/) to do content negotiation to enable discovery is a tough sell. Whatever technology is used to serve (and cache...) that page needs to be reconfigured to do the Right Thing with regard to content negotiation, with a big downside if something goes wrong and, so far, a small upside if things go well. Not a great sales pitch.

So I think Eran is right in that this isn't a great solution, but not because of web design purity; because of practical deployment issues. If there's a good alternative we should look at it and weigh the pros and cons, which is what I plan to do in the next post.

Discovery Mechanisms and Scale

After Eran's sessions at IIW today on discovery, I have some random thoughts which I figure might as well be captured (inflicted?) as blog posts. One comment I have is that "Scale Agnostic" is a little misleading in his matrix:

Untitled2

None of these solutions, by itself, spans the scale from a hosted site owner with Dreamweaver up to yahoo.com, so this column should really be all red. That's okay, because everyone agrees that at least a couple of solutions are needed to span both ends of this scale -- so the overall solution needs at least one solution at each end of the scale. Proposed: Replace "Scale Agnostic" with two columns, "Long Tail Scale" and "BigCo Scale", and make sure that the overall solution includes green in both columns.

2008/11/08

It's Internet Identity Workshop time again!

I'll be at IIW2008b, albeit intermittently, Mon-Wed next week. I think this is a make-or-break time for some of the identity technologies like OpenID and OAuth that have gotten early adoption -- can they leap the chasm to mass adoption? At least that's the theme I'm seeing so far.

2008/11/05

Thank you, America

As I held my infant son this morning and watching the sunrise together, I realized how proud I am of my country. We're capable of change. We can rise above intolerance, racism, class divisions, propaganda, and fear. I had faith in America, and it was justified last night. Thank you.

Also: The highest turnout since 1908? 136 million voters. 64% of eligible voters. And a majority, not just a plurality, voted for our next President. That's change.

2008/11/03

In Which Larry Lessig Scares the Pants off Me

Don't pay attention to polls or pundits; just vote. And get everyone you know to vote. If you're one of the vast majority of citizens (i.e., don't live in a swing state) then either run up the score or register a protest vote. Or call someone you know in a swing state and get them to vote if they haven't already, or follow Lessig's advice.

Also, watch FiveThirtyEight.

Abstractioneer by John Panzer