Salmon's 'Discoverable URIs' and PKI

One issue that's cropped up in Salmon is how to identify authors.  Mostly, feed syndication protocols hand-wave about this and at most specify a field for an email address.  Atom goes one step further and provides a way to specify a URI for an author, which could be a web site or profile page.

The main things that Salmon needs are (1) a stable, user-controlled identifier that can be used to correlate messages and (2) a way to jump from that identifier to a public key used for signing salmon.  The existing atom:author/uri element works fine for #1.  Webfinger+LRDD discovery can then take over for #2.  What comes out of that process is a public key suitable for verifying the provenance (authorship) of a salmon.

Now for the details.  It turns out that the flow for Webfinger is pretty stable, so any time you have an email like identifier for an author you can just slap "acct:" in front and things will Just Work.  But I don't want to limit authorship only to acct: URIs - I should be able to use any reasonable URI in the author/uri field and have things work.  By this I mean any URI that I can perform the right flavor of XRD discovery on and get out the public key.  It turns out that the existing specs cover 90% of this flow, but some things, like the order in which to look for links, remain unspecified.  If they remain unspecified by the time Salmon gets real deployment, then Salmon will need to fill in the gaps in its specification.  I'm hoping that's not necessary but it won't be a blocker if it is.

Also, there isn't an existing agreed-upon term that covers both Webfinger identifiers and the kind of URIs that you can perform LRDD/XRD discovery on, such as https://myprofile.com/myname.  Personally I would prefer to call them all Webfinger IDs but for now I'm calling them all discoverable URIs.  Which is a terrible name since, obviously, you don't need to discover the URI as that's what you're starting from.  Hopefully that will be motivation to get people to agree on a better name.  Also, this is obviously useful for more than just user account identifiers.  To start with, automatic services and sites will want to be able to participate as well.  For example, did you know that the Blogger service has a  public signing key?  It would be good to have that published somewhere more discoverable than this blog post.


A Web Wide Public Key Infrastructure

In my last post, I talked about Magic Signatures, the evolution of Salmon's Magic Security Pixie Dust into something concrete.  There's an important bit missing, which I'll talk about now:  Public key signatures are fairly useless without a way to discover and use other people's public keys in a secure, reliable way.  We need a distributed, easily deployable public key infrastructure.  Let's build one.

The basic plan for Salmon signature verification is fairly simple:  Use Webfinger discovery on the author of the Salmon to find their Magic Signature public key, if they have one.  To find this, you first get the Webfinger XRD file, then look for a "magickey" link:
<Link rel="http://salmon-protocol.org/ns/magickey" href="https://www.example.org/0B6A58300/publickey" />
Retrieve the magickey resource, parse it into a public key in your favorite library, and use that to verify the signature of the original Salmon you started with.

A few notes:
  • Retrievals of these documents should be done via TLS and certificates checked; if any step in the chain isn't done via checked TLS, the result may be vulnerable to MITM or DNS poisoning attacks.
  • The public key is per-user and is effectively self-signed.  While I anticipate most keys will be generated by large providers, it is possible for users to generate their own public/private keypairs and upload only the public key to the Web.
  • Maintaining data about expired and revoked keys will be needed too, but first things first.

The next question is what format that public key should be.  The obvious choice is an X.509 PKI certificate PEM file, but this turns out to pull in a ton of baggage; you can't even parse one of these using available App Engine libraries at the moment.  This is primarily due to the use of ASN.1 DER encoding, which is usually dealt with via a linked-in C library.  X.509 is also inherently complex and overkill for what we need; it's based on a hierarchical model of CAs which doesn't map well to Salmon's decentralized model.

When you dig down into things, it turns out that an RSA public key itself is fairly simple.  You have two numbers, a modulus (n, for some reason) and an exponent (e).  You need to serialize and deserialize these numbers in a standard way.  Most libraries let you pass in and access these values.  The only catch is that the numbers themselves can be very, very big -- bigger than any native number on any system, so you need something like a bignum library to deal with these numbers.  At the end of the day, though, you're just dealing with a pair of numbers.  You don't need complicated formats for that.

So, here's a very simple way to represent this in a web-safe way:
"RSA." + urlsafe_b64_encode(to_network_bytes(modulus)) + "." + urlsafe_b64_encode(to_network_bytes(exponent))
The output is two base64 encoded strings, separated by a period, and prefixed with the key type:
To parse this, you split on the period, urlsafe_b64_decode each piece into bytes, and then turn the bytes into bignums using your library.  (I'm using the PyCrypto bytes_to_long and long_to_bytes functions to play around with this at the moment.)

(There might possibly also be a need for a Subject attached to the key itself, to prevent Joe from pointing at Bob's key and claiming it as his own.  Joe still couldn't sign things as Bob but he could potentially claim Bob's output as his own.  So, add the base64 encoding of Joe's Webfinger ID to the public key.  It's not clear to me if this is needed quite yet.)

You may also want to store and retrieve private keys; it turns out the private keys just need one more value, d, along with n and e.  Append this to the end as an optional parameter.  Thoughts?  Anything missing?  You can also take a look at the in-progress Python code to deal with this format.  It's pretty trivial to write and parse.  I imagine that dealing with parsing additional algorithms will be slightly more complicated, but nothing compared with the complication of actually implementing said algorithms.

(All of this still of course relies on X.509 certificates underlying the SSL connections used to retrieve the user-specific public keys.  That's fine, because those certificates are hidden underneath existing libraries and don't need to be visible to Salmon code at all.)

This is a general purpose, lightweight discovery mechanism for personal signing keys.  It works well for Salmon; once widely deployed, it would be useful for other purposes as well.

(Edited to update thoughts on Subject, which, I think, is not needed.)


Magic Signatures for Salmon

In writing the spec for Salmon we soon discovered that what we really wanted was S/MIME signatures for the Web.  In other words, given a message, let you sign it with a private key, and let receivers verify the signature using the corresponding public key.  Signing and verifying are pretty well understood, but in practice canonicalizing data and signing is hard to get right.  Making sure that the mechanism adopted is really deployable and interoperable, even in restricted environments, is a top priority for Salmon.

I'm calling this the "Magic Signature" mechanism because it's not really Salmon-specific and you can analyze it without thinking much about Salmon at all.

One of the reasons why this is hard is because of the abstraction layers that we have in place in our software.  For example, encryption algorithms operate on byte sequences, but a given XML document can have many different byte sequence serialized forms.  Even JSON isn't immune to this, though mandating UTF-8 certainly helps.  So, the first thing to make really simple is the serialization format.  Here's the Magic Signature serialization algorithm:
b64_data = urlsafe_b64_enc(utf8text)
In other words, serialize your data however your libraries let you into utf8 text, then base64 encode the resulting bytes, using the url safe variant of base64.  That's the actual string you sign, and it's nearly impossible to mess up that string as it's 7 bit ASCII, uses no characters known to ever be escaped by anything, and is mostly an uninterpreted blob of text as far as your libraries and transport layers are concerned.  The one caveat is that some transports may need to insert linebreaks/whitespace due to line length limits -- this can be solved by squeezing out all whitespace (which is never part of the data) before signing or validating.

Signing is then standard; we'll mandate support for RSA_SHA1, meaning you take the SHA1 hash digest of that base64 data and then sign the hash using an RSA private key:
s = rsa_sign(private_key,sha1_digest(b64_data))
the result is a very big integer, which you convert to network-neutral bytes and then turn into a string with, you guessed it, urlsafe_b64_enc:
sig = urlsafe_b64_enc(to_binary(s))
Now for the ugly bit:  Since the whole premise of this is that the receiver is not going to be able to create exactly the same serialization of utf8text that the sender did, you need to help the receiver out by sending it the exact b64_data used to compute the original signature.  Since it's base64 encoded, it's effectively armored not only against vagaries of transport protocols but also software stacks and frameworks.

Since you're sending the base64 data, and it's trivial to base64-decode it, there's no point in sending the original data as well.  So you just send the content, wrapped in its base64 envelope, plus a signature.  Call this a "Magic Envelope":

<?xml version='1.0' encoding='UTF-8'?>
<me:env xmlns:me='http://salmon-protocol.org/ns/magic-env'>
  <me:data type='application/atom+xml' encoding='base64'>
And on the receiving side, you base64_decode to get the original content, you calculate the sha1_digest on that base64 data, and verify the signature.  If it works out, you use the resulting data, in this case a Salmon that was hidden inside the magic envelope:

<?xml version="1.0" encoding="utf-8"?><entry xmlns="http://www.w3.org/2005/Atom">
  <thr:in-reply-to ref="tag:blogger.com,1999:blog-893591374313312737.post-3861663258538857954" xmlns:thr="http://purl.org/syndication/thread/1.0">tag:blogger.com,1999:blog-893591374313312737.post-3861663258538857954
  <content>Salmon swim upstream!</content>
  <title>Salmon swim upstream!</title>
<me:provenance xmlns:me="http://salmon-protocol.org/ns/magic-env"><me:data encoding="base64" type="application/atom+xml">PD94bWwgdmVyc2lvbj0nMS4wJyBlbmNvZGluZz0nVVRGLTgnPz4KPGVudHJ5IHhtbG5zPSdodHRwOi8vd3d3LnczLm9yZy8yMDA1L0F0b20nPgogIDxpZD50YWc6ZXhhbXBsZS5jb20sMjAwOTpjbXQtMC40NDc3NTcxODwvaWQ-ICAKICA8YXV0aG9yPjxuYW1lPnRlc3RAZXhhbXBsZS5jb208L25hbWU-PHVyaT5hY2N0OmpwYW56ZXJAZ29vZ2xlLmNvbTwvdXJpPjwvYXV0aG9yPgogIDx0aHI6aW4tcmVwbHktdG8geG1sbnM6dGhyPSdodHRwOi8vcHVybC5vcmcvc3luZGljYXRpb24vdGhyZWFkLzEuMCcKICAgICAgcmVmPSd0YWc6YmxvZ2dlci5jb20sMTk5OTpibG9nLTg5MzU5MTM3NDMxMzMxMjczNy5wb3N0LTM4NjE2NjMyNTg1Mzg4NTc5NTQnPnRhZzpibG9nZ2VyLmNvbSwxOTk5OmJsb2ctODkzNTkxMzc0MzEzMzEyNzM3LnBvc3QtMzg2MTY2MzI1ODUzODg1Nzk1NAogIDwvdGhyOmluLXJlcGx5LXRvPgogIDxjb250ZW50PlNhbG1vbiBzd2ltIHVwc3RyZWFtITwvY29udGVudD4KICA8dGl0bGU-U2FsbW9uIHN3aW0gdXBzdHJlYW0hPC90aXRsZT4KICA8dXBkYXRlZD4yMDA5LTEyLTE4VDIwOjA0OjAzWjwvdXBkYXRlZD4KPC9lbnRyeT4KICAgIA==</me:data><me:alg>RSA-SHA1</me:alg><me:sig>EvGSD2vi8qYcveHnb-rrlok07qnCXjn8YSeCDDXlbhILSabgvNsPpbe76up8w63i2fWHvLKJzeGLKfyHg8ZomQ==</me:sig></me:provenance></entry>
Note that the signature, and the base64 data, is still carried inside a "provenance" element of the salmon for future verification.

This is all fun to describe, but it's even more fun to play with.  Take a look at http:/salmon-playground.appspot.com/magicsigdemo to see this in action.  When you load it, you'll see that it gives you an error -- it will refuse to sign your salmon until you correct the author URI.  This is a feature; the demo checks that the signed-in user matches one of the authors of the salmon, so you need to edit the author/uri field to read "acct:<your email address here>" to make it work.

Next, you'll see the magic envelope appear.  You can verify the signature, which sends a request back to the server and replies Yes or No.  Or, you can unfold the envelope back into an Atom salmon to read the content.  Of course, if you tamper with the salmon first it will neither verify nor unfold properly.

For Salmon-aware processors, there's little reason to use anything but the application/magic-envelope form.  For syndication in general, though, it may be necessary to wrap the envelope in an Atom or RSS entry.

The source code for all of this is freely available .  If you're interested in all of this, please join the Salmon discussion group.

(Updated 1/19 to include a note about squeezing out whitespace from the b64 encoded data before doing anything important with it, per gffletch's comment.)

Fork in the Road for Salmon

Happy New Year!

Over the past several weeks, I've been doing a Salmon conference roadshow, talking and listening to people about the protocol and getting feedback.

At IIW, we had a "Magic Security Pixie Dust" Salmon session which kicked around the use cases and challenges in detail.  There and elsewhere I got basically two kinds of feedback:  (a) The specification, especially the signatures, was too complicated; and (b) the specification, especially the signatures, was not comprehensive enough.

There was a suggestion to drop signatures entirely and just rely on reputation of the salmon generators, who would be on the hook for vouching for the identity of their users.  This would simplify the protocol but at the cost of giving it effectively the same security characteristics as email.  This was initially attractive and I spent some time playing with what that would look like.  Unfortunately, I am pretty sure that it would just make operation more complex and adoption more problematic, and you'd likely need to pre-federate every salmon generator/acceptor pair to get things off the ground.  

At another meeting I was fortunate enough to get Ben Laurie in the room, and he suggested a simple mechanism for comprehensive signatures.  I've been playing with it and it's looking pretty good.  By that I mean that it works, and I think it's implementable even with stone knives and bearskins minimal library support.

And, in parallel, many people have started talking about additional use cases for Salmon, most of which benefit from a simple signing mechanism:
  • A personal store of comments, posts, and activities you do around the Web, kept in your online storage mechanism of choice, used for backup, archiving, and search purposes ("what was that conversation I had last week?...")
  • Salmon for mentions (@replies to a particular person) which just send a Salmon, not necessarily as a reply to a piece of content, but to a person as a mentionee.
  • Structured data with verifiable provenance for things like bids and asks
  • Building up and using distributed reputation based on analysis of comments, ratings, reviews, etc. published via Salmon.
  • ...etc
Sometimes, the more general problem is easier to solve.  After a lot of thinking, talking, and coding, I've come around to believing in the more comprehensive solution.

This doesn't change the outline of Salmon specified at http://salmon-protocol.org.  It drops the sign-selected-fields mechanism, and lets you sign the entire Salmon, including any extension data.  As an added bonus, it doesn't matter what format you choose, what your character encoding or Infoset serialization is; it'll just work regardless.  

Since it is a public key signature mechanism, it does require a public key infrastructure.  Fortunately, we have the pieces to build one easily and simply at hand.

I'll blog first about the signature mechanism, and second about the public key mechanism, in the next couple of days.