Skip to main content

Magic Signatures for Salmon

In writing the spec for Salmon we soon discovered that what we really wanted was S/MIME signatures for the Web.  In other words, given a message, let you sign it with a private key, and let receivers verify the signature using the corresponding public key.  Signing and verifying are pretty well understood, but in practice canonicalizing data and signing is hard to get right.  Making sure that the mechanism adopted is really deployable and interoperable, even in restricted environments, is a top priority for Salmon.

I'm calling this the "Magic Signature" mechanism because it's not really Salmon-specific and you can analyze it without thinking much about Salmon at all.

One of the reasons why this is hard is because of the abstraction layers that we have in place in our software.  For example, encryption algorithms operate on byte sequences, but a given XML document can have many different byte sequence serialized forms.  Even JSON isn't immune to this, though mandating UTF-8 certainly helps.  So, the first thing to make really simple is the serialization format.  Here's the Magic Signature serialization algorithm:
b64_data = urlsafe_b64_enc(utf8text)
In other words, serialize your data however your libraries let you into utf8 text, then base64 encode the resulting bytes, using the url safe variant of base64.  That's the actual string you sign, and it's nearly impossible to mess up that string as it's 7 bit ASCII, uses no characters known to ever be escaped by anything, and is mostly an uninterpreted blob of text as far as your libraries and transport layers are concerned.  The one caveat is that some transports may need to insert linebreaks/whitespace due to line length limits -- this can be solved by squeezing out all whitespace (which is never part of the data) before signing or validating.

Signing is then standard; we'll mandate support for RSA_SHA1, meaning you take the SHA1 hash digest of that base64 data and then sign the hash using an RSA private key:
s = rsa_sign(private_key,sha1_digest(b64_data))
the result is a very big integer, which you convert to network-neutral bytes and then turn into a string with, you guessed it, urlsafe_b64_enc:
sig = urlsafe_b64_enc(to_binary(s))
Now for the ugly bit:  Since the whole premise of this is that the receiver is not going to be able to create exactly the same serialization of utf8text that the sender did, you need to help the receiver out by sending it the exact b64_data used to compute the original signature.  Since it's base64 encoded, it's effectively armored not only against vagaries of transport protocols but also software stacks and frameworks.

Since you're sending the base64 data, and it's trivial to base64-decode it, there's no point in sending the original data as well.  So you just send the content, wrapped in its base64 envelope, plus a signature.  Call this a "Magic Envelope":

<?xml version='1.0' encoding='UTF-8'?>
<me:env xmlns:me='http://salmon-protocol.org/ns/magic-env'>
  <me:data type='application/atom+xml' encoding='base64'>
PD94bWwgdmVyc2lvbj0nMS4wJyBlbmNvZGluZz0nVVRGLTgnPz4KPGVudHJ5IHhtbG5zPSdodHRwOi8vd3d3LnczLm9yZy8yMDA1L0F0b20nPgogIDxpZD50YWc6ZXhhbXBsZS5jb20sMjAwOTpjbXQtMC40NDc3NTcxODwvaWQ-ICAKICA8YXV0aG9yPjxuYW1lPnRlc3RAZXhhbXBsZS5jb208L25hbWU-PHVyaT5hY2N0OmpwYW56ZXJAZ29vZ2xlLmNvbTwvdXJpPjwvYXV0aG9yPgogIDx0aHI6aW4tcmVwbHktdG8geG1sbnM6dGhyPSdodHRwOi8vcHVybC5vcmcvc3luZGljYXRpb24vdGhyZWFkLzEuMCcKICAgICAgcmVmPSd0YWc6YmxvZ2dlci5jb20sMTk5OTpibG9nLTg5MzU5MTM3NDMxMzMxMjczNy5wb3N0LTM4NjE2NjMyNTg1Mzg4NTc5NTQnPnRhZzpibG9nZ2VyLmNvbSwxOTk5OmJsb2ctODkzNTkxMzc0MzEzMzEyNzM3LnBvc3QtMzg2MTY2MzI1ODUzODg1Nzk1NAogIDwvdGhyOmluLXJlcGx5LXRvPgogIDxjb250ZW50PlNhbG1vbiBzd2ltIHVwc3RyZWFtITwvY29udGVudD4KICA8dGl0bGU-U2FsbW9uIHN3aW0gdXBzdHJlYW0hPC90aXRsZT4KICA8dXBkYXRlZD4yMDA5LTEyLTE4VDIwOjA0OjAzWjwvdXBkYXRlZD4KPC9lbnRyeT4KICAgIA==</me:data>
  <me:alg>RSA-SHA1</me:alg>
  <me:sig>EvGSD2vi8qYcveHnb-rrlok07qnCXjn8YSeCDDXlbhILSabgvNsPpbe76up8w63i2fWHvLKJzeGLKfyHg8ZomQ==</me:sig>
</me:env>
And on the receiving side, you base64_decode to get the original content, you calculate the sha1_digest on that base64 data, and verify the signature.  If it works out, you use the resulting data, in this case a Salmon that was hidden inside the magic envelope:

<?xml version="1.0" encoding="utf-8"?><entry xmlns="http://www.w3.org/2005/Atom">
  <id>tag:example.com,2009:cmt-0.44775718</id>
  <author><name>test@example.com</name><uri>acct:jpanzer@google.com</uri></author>
  <thr:in-reply-to ref="tag:blogger.com,1999:blog-893591374313312737.post-3861663258538857954" xmlns:thr="http://purl.org/syndication/thread/1.0">tag:blogger.com,1999:blog-893591374313312737.post-3861663258538857954
  </thr:in-reply-to>
  <content>Salmon swim upstream!</content>
  <title>Salmon swim upstream!</title>
  <updated>2009-12-18T20:04:03Z</updated>
<me:provenance xmlns:me="http://salmon-protocol.org/ns/magic-env"><me:data encoding="base64" type="application/atom+xml">PD94bWwgdmVyc2lvbj0nMS4wJyBlbmNvZGluZz0nVVRGLTgnPz4KPGVudHJ5IHhtbG5zPSdodHRwOi8vd3d3LnczLm9yZy8yMDA1L0F0b20nPgogIDxpZD50YWc6ZXhhbXBsZS5jb20sMjAwOTpjbXQtMC40NDc3NTcxODwvaWQ-ICAKICA8YXV0aG9yPjxuYW1lPnRlc3RAZXhhbXBsZS5jb208L25hbWU-PHVyaT5hY2N0OmpwYW56ZXJAZ29vZ2xlLmNvbTwvdXJpPjwvYXV0aG9yPgogIDx0aHI6aW4tcmVwbHktdG8geG1sbnM6dGhyPSdodHRwOi8vcHVybC5vcmcvc3luZGljYXRpb24vdGhyZWFkLzEuMCcKICAgICAgcmVmPSd0YWc6YmxvZ2dlci5jb20sMTk5OTpibG9nLTg5MzU5MTM3NDMxMzMxMjczNy5wb3N0LTM4NjE2NjMyNTg1Mzg4NTc5NTQnPnRhZzpibG9nZ2VyLmNvbSwxOTk5OmJsb2ctODkzNTkxMzc0MzEzMzEyNzM3LnBvc3QtMzg2MTY2MzI1ODUzODg1Nzk1NAogIDwvdGhyOmluLXJlcGx5LXRvPgogIDxjb250ZW50PlNhbG1vbiBzd2ltIHVwc3RyZWFtITwvY29udGVudD4KICA8dGl0bGU-U2FsbW9uIHN3aW0gdXBzdHJlYW0hPC90aXRsZT4KICA8dXBkYXRlZD4yMDA5LTEyLTE4VDIwOjA0OjAzWjwvdXBkYXRlZD4KPC9lbnRyeT4KICAgIA==</me:data><me:alg>RSA-SHA1</me:alg><me:sig>EvGSD2vi8qYcveHnb-rrlok07qnCXjn8YSeCDDXlbhILSabgvNsPpbe76up8w63i2fWHvLKJzeGLKfyHg8ZomQ==</me:sig></me:provenance></entry>
Note that the signature, and the base64 data, is still carried inside a "provenance" element of the salmon for future verification.

This is all fun to describe, but it's even more fun to play with.  Take a look at http:/salmon-playground.appspot.com/magicsigdemo to see this in action.  When you load it, you'll see that it gives you an error -- it will refuse to sign your salmon until you correct the author URI.  This is a feature; the demo checks that the signed-in user matches one of the authors of the salmon, so you need to edit the author/uri field to read "acct:<your email address here>" to make it work.

Next, you'll see the magic envelope appear.  You can verify the signature, which sends a request back to the server and replies Yes or No.  Or, you can unfold the envelope back into an Atom salmon to read the content.  Of course, if you tamper with the salmon first it will neither verify nor unfold properly.

For Salmon-aware processors, there's little reason to use anything but the application/magic-envelope form.  For syndication in general, though, it may be necessary to wrap the envelope in an Atom or RSS entry.

The source code for all of this is freely available .  If you're interested in all of this, please join the Salmon discussion group.

(Updated 1/19 to include a note about squeezing out whitespace from the b64 encoded data before doing anything important with it, per gffletch's comment.)

Comments

  1. This is so wonderfully simple that I'm amazed that we haven't been doing this for years. The issue of canonicalization has been one of the major reasons that most past attempts to use signatures have failed. But, Ben's simple hack of signing base64 encodings erases all the issues. Excellent!

    I assume that in order to figure out which key to use to verify the signature that I only need to unwrap the base64 data and then use WebFinger to find the key based on author name. Is that correct?

    It seems to me that conversion to utf8, while reasonable for Salmon, is not actually something that is generally necessary to use such a method.

    Thanks, John, for getting this done!

    bob wyman

    ReplyDelete
  2. Great post John. Glad to see a simple signing algorithm used for this. Here's one thing you might want to investigate further.

    When we implemented SAML Simple Sign (very similar signing method) at AOL, we ran into a problem with the then current spec because it relied on signing the base64 output. The problem we ran into is that the base64 encoder generated output with line breaks every 72 chars. However, since Simple Sign relies on HTTP POST redirects, some browsers changed the CR in the base64 output to CRLF. This broke the signatures.

    If it's possible, I think it's cleaner to sign the utf8text rather than the base64 output. This is what the OASIS SSTC decided to do with the SAML Simple Sign Binding.

    Of course, this issue is only likely to apply if a message path for Salmon is via a browser using POST redirects.

    George

    ReplyDelete
  3. Thanks George; due to other considerations (utf8 being tricky too) I'm going to update this to specify that you have to remove all whitespace from the base64 string before signing it or verifying it. So intermediaries can insert whatever linebreaks, etc. they want without breaking anything.

    ReplyDelete
  4. Sorry, but I am missing some detail in this approach. In order for the recipient to either extract the data or to verify the the signature includes the expected kinds of data, it needs to understand what is encoded into the utf8text. In other words, just sending urlsafe_b64_enc(utf8text) and its signature is not sufficient. Isn't that knowledge some kind of c14n?

    ReplyDelete
  5. Subbu - Yes, the 'type' attribute tells you the MIME type of the content. If you don't understand application/atom+xml (in this example) you're out of luck.

    ReplyDelete
  6. But that's the same XML C14N since the sender and the recipient needs to agree on how Atom is seriazed into utf8 text.

    ReplyDelete
  7. Subbu - They don't need to agree on the serialization, as the sender just chooses whatever it uses and signs the base64 encoding of it.
    Sender and receiver just need to both end up with approximately the same DOM tree when the sender parses the serialization. Basically, as long as they can communicate normally via XML, they can communicate via XML-inside-a-magic-envelope, because the signing happens after encoding to base64, and the base64 encoding ensures that the receiver can recover the exact bytes that are signed.

    ReplyDelete
  8. Oops - I see it now. What you are proposing is similar to Content-Encoding (i.e decode and verify before processing the message). Good idea.

    (For some reason your blog does not let me post the comment on first try - a retry is working).

    ReplyDelete
  9. John - I think an extra envelope is not necessary. How about the alternative at - http://www.subbu.org/blog/2010/01/envelope-for-signatures.

    ReplyDelete
  10. Subbu - Yep, you can express the magic envelope in a variety of formats; MIME is a fine format, especially for sending individual messages. I have a JSON serialization written up in comments in the salmon-playground source code too. For the Salmon use case, we're dealing with XML data anyway so the XML tax is already paid, and we want to be able to embed the results in a feed (for batch transmission and syndication via things like PubSubHubbub).

    For a MIME envelope, the Authorization header isn't quite right for this, as the signature is really something that goes along with the entity rather than the request. I think you'd want something more similar to the DKIM-Signature MIME header. So define a Magic-Signature header with the method and you're golden.

    ReplyDelete

Post a Comment

Popular posts from this blog

The problem with creation date metadata in PDF documents

Last night Rachel Maddow talked about an apparently fake NSA document "leaked" to her organization.  There's a lot of info there, I suggest you listen to the whole thing:

http://www.msnbc.com/rachel-maddow/watch/maddow-to-news-orgs-heads-up-for-hoaxes-985491523709

There's a lot to unpack there but it looks like somebody tried to fool MSNBC into running with a fake accusation based on faked NSA documents, apparently based on cloning the document the Intercept published back on 6/5/2017, which to all appearances was itself a real NSA document in PDF form.

I think the main thrust of this story is chilling and really important to get straight -- some person or persons unknown is sending forged PDFs to news organization(s), apparently trying to get them to run stories based on forged documents.  And I completely agree with Maddow that she was right to send up a "signal flare" to all the news organizations to look out for forgeries.  Really, really, really import…

Personal Web Discovery (aka Webfinger)

There's a particular discovery problem for open and distributed protocols such as OpenID, OAuth, Portable Contacts, Activity Streams, and OpenSocial.  It seems like a trivial problem, but it's one of the stumbling blocks that slows mass adoption.  We need to fix it.  So first, I'm going to name it:

The Personal Web Discovery Problem:  Given a person, how do I find out what services that person uses?
This does sound trivial, doesn't it?  And it is easy as long as you're service-centric; if you're building on top of social network X, there is no discovery problem, or at least only a trivial one that can be solved with proprietary APIs.  But what if you want to build on top of X,Y, and Z?  Well, you write code to make the user log in to each one so you can call those proprietary APIs... which means the user has to tell you their identity (and probably password) on each one... and the user has already clicked the Back button because this is complicated and annoying.

Twister is interesting.  It's a decentralized "microblogging" system based on putting together existing protocols:  Bitcoin, distributed hash tables, and Bittorrent.  The most interesting part for me is using Bitcoin for user registration and spam control.  Federated systems handle this with federated trust, which is at least conceptually simple.  The Twister/Bitcoin mechanism looks intriguing though I don't know enough about Bitcoin to really comment.  Need to read further.