2010/01/12

Magic Signatures for Salmon

In writing the spec for Salmon we soon discovered that what we really wanted was S/MIME signatures for the Web.  In other words, given a message, let you sign it with a private key, and let receivers verify the signature using the corresponding public key.  Signing and verifying are pretty well understood, but in practice canonicalizing data and signing is hard to get right.  Making sure that the mechanism adopted is really deployable and interoperable, even in restricted environments, is a top priority for Salmon.

I'm calling this the "Magic Signature" mechanism because it's not really Salmon-specific and you can analyze it without thinking much about Salmon at all.

One of the reasons why this is hard is because of the abstraction layers that we have in place in our software.  For example, encryption algorithms operate on byte sequences, but a given XML document can have many different byte sequence serialized forms.  Even JSON isn't immune to this, though mandating UTF-8 certainly helps.  So, the first thing to make really simple is the serialization format.  Here's the Magic Signature serialization algorithm:
b64_data = urlsafe_b64_enc(utf8text)
In other words, serialize your data however your libraries let you into utf8 text, then base64 encode the resulting bytes, using the url safe variant of base64.  That's the actual string you sign, and it's nearly impossible to mess up that string as it's 7 bit ASCII, uses no characters known to ever be escaped by anything, and is mostly an uninterpreted blob of text as far as your libraries and transport layers are concerned.  The one caveat is that some transports may need to insert linebreaks/whitespace due to line length limits -- this can be solved by squeezing out all whitespace (which is never part of the data) before signing or validating.

Signing is then standard; we'll mandate support for RSA_SHA1, meaning you take the SHA1 hash digest of that base64 data and then sign the hash using an RSA private key:
s = rsa_sign(private_key,sha1_digest(b64_data))
the result is a very big integer, which you convert to network-neutral bytes and then turn into a string with, you guessed it, urlsafe_b64_enc:
sig = urlsafe_b64_enc(to_binary(s))
Now for the ugly bit:  Since the whole premise of this is that the receiver is not going to be able to create exactly the same serialization of utf8text that the sender did, you need to help the receiver out by sending it the exact b64_data used to compute the original signature.  Since it's base64 encoded, it's effectively armored not only against vagaries of transport protocols but also software stacks and frameworks.

Since you're sending the base64 data, and it's trivial to base64-decode it, there's no point in sending the original data as well.  So you just send the content, wrapped in its base64 envelope, plus a signature.  Call this a "Magic Envelope":

<?xml version='1.0' encoding='UTF-8'?>
<me:env xmlns:me='http://salmon-protocol.org/ns/magic-env'>
  <me:data type='application/atom+xml' encoding='base64'>
PD94bWwgdmVyc2lvbj0nMS4wJyBlbmNvZGluZz0nVVRGLTgnPz4KPGVudHJ5IHhtbG5zPSdodHRwOi8vd3d3LnczLm9yZy8yMDA1L0F0b20nPgogIDxpZD50YWc6ZXhhbXBsZS5jb20sMjAwOTpjbXQtMC40NDc3NTcxODwvaWQ-ICAKICA8YXV0aG9yPjxuYW1lPnRlc3RAZXhhbXBsZS5jb208L25hbWU-PHVyaT5hY2N0OmpwYW56ZXJAZ29vZ2xlLmNvbTwvdXJpPjwvYXV0aG9yPgogIDx0aHI6aW4tcmVwbHktdG8geG1sbnM6dGhyPSdodHRwOi8vcHVybC5vcmcvc3luZGljYXRpb24vdGhyZWFkLzEuMCcKICAgICAgcmVmPSd0YWc6YmxvZ2dlci5jb20sMTk5OTpibG9nLTg5MzU5MTM3NDMxMzMxMjczNy5wb3N0LTM4NjE2NjMyNTg1Mzg4NTc5NTQnPnRhZzpibG9nZ2VyLmNvbSwxOTk5OmJsb2ctODkzNTkxMzc0MzEzMzEyNzM3LnBvc3QtMzg2MTY2MzI1ODUzODg1Nzk1NAogIDwvdGhyOmluLXJlcGx5LXRvPgogIDxjb250ZW50PlNhbG1vbiBzd2ltIHVwc3RyZWFtITwvY29udGVudD4KICA8dGl0bGU-U2FsbW9uIHN3aW0gdXBzdHJlYW0hPC90aXRsZT4KICA8dXBkYXRlZD4yMDA5LTEyLTE4VDIwOjA0OjAzWjwvdXBkYXRlZD4KPC9lbnRyeT4KICAgIA==</me:data>
  <me:alg>RSA-SHA1</me:alg>
  <me:sig>EvGSD2vi8qYcveHnb-rrlok07qnCXjn8YSeCDDXlbhILSabgvNsPpbe76up8w63i2fWHvLKJzeGLKfyHg8ZomQ==</me:sig>
</me:env>
And on the receiving side, you base64_decode to get the original content, you calculate the sha1_digest on that base64 data, and verify the signature.  If it works out, you use the resulting data, in this case a Salmon that was hidden inside the magic envelope:

<?xml version="1.0" encoding="utf-8"?><entry xmlns="http://www.w3.org/2005/Atom">
  <id>tag:example.com,2009:cmt-0.44775718</id>
  <author><name>test@example.com</name><uri>acct:jpanzer@google.com</uri></author>
  <thr:in-reply-to ref="tag:blogger.com,1999:blog-893591374313312737.post-3861663258538857954" xmlns:thr="http://purl.org/syndication/thread/1.0">tag:blogger.com,1999:blog-893591374313312737.post-3861663258538857954
  </thr:in-reply-to>
  <content>Salmon swim upstream!</content>
  <title>Salmon swim upstream!</title>
  <updated>2009-12-18T20:04:03Z</updated>
<me:provenance xmlns:me="http://salmon-protocol.org/ns/magic-env"><me:data encoding="base64" type="application/atom+xml">PD94bWwgdmVyc2lvbj0nMS4wJyBlbmNvZGluZz0nVVRGLTgnPz4KPGVudHJ5IHhtbG5zPSdodHRwOi8vd3d3LnczLm9yZy8yMDA1L0F0b20nPgogIDxpZD50YWc6ZXhhbXBsZS5jb20sMjAwOTpjbXQtMC40NDc3NTcxODwvaWQ-ICAKICA8YXV0aG9yPjxuYW1lPnRlc3RAZXhhbXBsZS5jb208L25hbWU-PHVyaT5hY2N0OmpwYW56ZXJAZ29vZ2xlLmNvbTwvdXJpPjwvYXV0aG9yPgogIDx0aHI6aW4tcmVwbHktdG8geG1sbnM6dGhyPSdodHRwOi8vcHVybC5vcmcvc3luZGljYXRpb24vdGhyZWFkLzEuMCcKICAgICAgcmVmPSd0YWc6YmxvZ2dlci5jb20sMTk5OTpibG9nLTg5MzU5MTM3NDMxMzMxMjczNy5wb3N0LTM4NjE2NjMyNTg1Mzg4NTc5NTQnPnRhZzpibG9nZ2VyLmNvbSwxOTk5OmJsb2ctODkzNTkxMzc0MzEzMzEyNzM3LnBvc3QtMzg2MTY2MzI1ODUzODg1Nzk1NAogIDwvdGhyOmluLXJlcGx5LXRvPgogIDxjb250ZW50PlNhbG1vbiBzd2ltIHVwc3RyZWFtITwvY29udGVudD4KICA8dGl0bGU-U2FsbW9uIHN3aW0gdXBzdHJlYW0hPC90aXRsZT4KICA8dXBkYXRlZD4yMDA5LTEyLTE4VDIwOjA0OjAzWjwvdXBkYXRlZD4KPC9lbnRyeT4KICAgIA==</me:data><me:alg>RSA-SHA1</me:alg><me:sig>EvGSD2vi8qYcveHnb-rrlok07qnCXjn8YSeCDDXlbhILSabgvNsPpbe76up8w63i2fWHvLKJzeGLKfyHg8ZomQ==</me:sig></me:provenance></entry>
Note that the signature, and the base64 data, is still carried inside a "provenance" element of the salmon for future verification.

This is all fun to describe, but it's even more fun to play with.  Take a look at http:/salmon-playground.appspot.com/magicsigdemo to see this in action.  When you load it, you'll see that it gives you an error -- it will refuse to sign your salmon until you correct the author URI.  This is a feature; the demo checks that the signed-in user matches one of the authors of the salmon, so you need to edit the author/uri field to read "acct:<your email address here>" to make it work.

Next, you'll see the magic envelope appear.  You can verify the signature, which sends a request back to the server and replies Yes or No.  Or, you can unfold the envelope back into an Atom salmon to read the content.  Of course, if you tamper with the salmon first it will neither verify nor unfold properly.

For Salmon-aware processors, there's little reason to use anything but the application/magic-envelope form.  For syndication in general, though, it may be necessary to wrap the envelope in an Atom or RSS entry.

The source code for all of this is freely available .  If you're interested in all of this, please join the Salmon discussion group.

(Updated 1/19 to include a note about squeezing out whitespace from the b64 encoded data before doing anything important with it, per gffletch's comment.)

10 comments:

  1. This is so wonderfully simple that I'm amazed that we haven't been doing this for years. The issue of canonicalization has been one of the major reasons that most past attempts to use signatures have failed. But, Ben's simple hack of signing base64 encodings erases all the issues. Excellent!

    I assume that in order to figure out which key to use to verify the signature that I only need to unwrap the base64 data and then use WebFinger to find the key based on author name. Is that correct?

    It seems to me that conversion to utf8, while reasonable for Salmon, is not actually something that is generally necessary to use such a method.

    Thanks, John, for getting this done!

    bob wyman

    ReplyDelete
  2. Great post John. Glad to see a simple signing algorithm used for this. Here's one thing you might want to investigate further.

    When we implemented SAML Simple Sign (very similar signing method) at AOL, we ran into a problem with the then current spec because it relied on signing the base64 output. The problem we ran into is that the base64 encoder generated output with line breaks every 72 chars. However, since Simple Sign relies on HTTP POST redirects, some browsers changed the CR in the base64 output to CRLF. This broke the signatures.

    If it's possible, I think it's cleaner to sign the utf8text rather than the base64 output. This is what the OASIS SSTC decided to do with the SAML Simple Sign Binding.

    Of course, this issue is only likely to apply if a message path for Salmon is via a browser using POST redirects.

    George

    ReplyDelete
  3. Thanks George; due to other considerations (utf8 being tricky too) I'm going to update this to specify that you have to remove all whitespace from the base64 string before signing it or verifying it. So intermediaries can insert whatever linebreaks, etc. they want without breaking anything.

    ReplyDelete
  4. Sorry, but I am missing some detail in this approach. In order for the recipient to either extract the data or to verify the the signature includes the expected kinds of data, it needs to understand what is encoded into the utf8text. In other words, just sending urlsafe_b64_enc(utf8text) and its signature is not sufficient. Isn't that knowledge some kind of c14n?

    ReplyDelete
  5. Subbu - Yes, the 'type' attribute tells you the MIME type of the content. If you don't understand application/atom+xml (in this example) you're out of luck.

    ReplyDelete
  6. But that's the same XML C14N since the sender and the recipient needs to agree on how Atom is seriazed into utf8 text.

    ReplyDelete
  7. Subbu - They don't need to agree on the serialization, as the sender just chooses whatever it uses and signs the base64 encoding of it.
    Sender and receiver just need to both end up with approximately the same DOM tree when the sender parses the serialization. Basically, as long as they can communicate normally via XML, they can communicate via XML-inside-a-magic-envelope, because the signing happens after encoding to base64, and the base64 encoding ensures that the receiver can recover the exact bytes that are signed.

    ReplyDelete
  8. Oops - I see it now. What you are proposing is similar to Content-Encoding (i.e decode and verify before processing the message). Good idea.

    (For some reason your blog does not let me post the comment on first try - a retry is working).

    ReplyDelete
  9. John - I think an extra envelope is not necessary. How about the alternative at - http://www.subbu.org/blog/2010/01/envelope-for-signatures.

    ReplyDelete
  10. Subbu - Yep, you can express the magic envelope in a variety of formats; MIME is a fine format, especially for sending individual messages. I have a JSON serialization written up in comments in the salmon-playground source code too. For the Salmon use case, we're dealing with XML data anyway so the XML tax is already paid, and we want to be able to embed the results in a feed (for batch transmission and syndication via things like PubSubHubbub).

    For a MIME envelope, the Authorization header isn't quite right for this, as the signature is really something that goes along with the entity rather than the request. I think you'd want something more similar to the DKIM-Signature MIME header. So define a Magic-Signature header with the method and you're golden.

    ReplyDelete

Suspended by the Baby Boss at Twitter

Well!  I'm now suspended from Twitter for stating that Elon's jet was in London recently.  (It was flying in the air to Qatar at the...