2010/01/19

A Web Wide Public Key Infrastructure

In my last post, I talked about Magic Signatures, the evolution of Salmon's Magic Security Pixie Dust into something concrete.  There's an important bit missing, which I'll talk about now:  Public key signatures are fairly useless without a way to discover and use other people's public keys in a secure, reliable way.  We need a distributed, easily deployable public key infrastructure.  Let's build one.

The basic plan for Salmon signature verification is fairly simple:  Use Webfinger discovery on the author of the Salmon to find their Magic Signature public key, if they have one.  To find this, you first get the Webfinger XRD file, then look for a "magickey" link:
<Link rel="http://salmon-protocol.org/ns/magickey" href="https://www.example.org/0B6A58300/publickey" />
Retrieve the magickey resource, parse it into a public key in your favorite library, and use that to verify the signature of the original Salmon you started with.

A few notes:
  • Retrievals of these documents should be done via TLS and certificates checked; if any step in the chain isn't done via checked TLS, the result may be vulnerable to MITM or DNS poisoning attacks.
  • The public key is per-user and is effectively self-signed.  While I anticipate most keys will be generated by large providers, it is possible for users to generate their own public/private keypairs and upload only the public key to the Web.
  • Maintaining data about expired and revoked keys will be needed too, but first things first.

The next question is what format that public key should be.  The obvious choice is an X.509 PKI certificate PEM file, but this turns out to pull in a ton of baggage; you can't even parse one of these using available App Engine libraries at the moment.  This is primarily due to the use of ASN.1 DER encoding, which is usually dealt with via a linked-in C library.  X.509 is also inherently complex and overkill for what we need; it's based on a hierarchical model of CAs which doesn't map well to Salmon's decentralized model.

When you dig down into things, it turns out that an RSA public key itself is fairly simple.  You have two numbers, a modulus (n, for some reason) and an exponent (e).  You need to serialize and deserialize these numbers in a standard way.  Most libraries let you pass in and access these values.  The only catch is that the numbers themselves can be very, very big -- bigger than any native number on any system, so you need something like a bignum library to deal with these numbers.  At the end of the day, though, you're just dealing with a pair of numbers.  You don't need complicated formats for that.

So, here's a very simple way to represent this in a web-safe way:
"RSA." + urlsafe_b64_encode(to_network_bytes(modulus)) + "." + urlsafe_b64_encode(to_network_bytes(exponent))
The output is two base64 encoded strings, separated by a period, and prefixed with the key type:
RSA.mVgY8RN6URBTstndvmUUPb4UZTdwvwmddSKE5z_jvKUEK6yk1u3rrC9yN8k6FilGj9K0eeUPe2hf4Pj-5CmHww==.AQAB
To parse this, you split on the period, urlsafe_b64_decode each piece into bytes, and then turn the bytes into bignums using your library.  (I'm using the PyCrypto bytes_to_long and long_to_bytes functions to play around with this at the moment.)

(There might possibly also be a need for a Subject attached to the key itself, to prevent Joe from pointing at Bob's key and claiming it as his own.  Joe still couldn't sign things as Bob but he could potentially claim Bob's output as his own.  So, add the base64 encoding of Joe's Webfinger ID to the public key.  It's not clear to me if this is needed quite yet.)

You may also want to store and retrieve private keys; it turns out the private keys just need one more value, d, along with n and e.  Append this to the end as an optional parameter.  Thoughts?  Anything missing?  You can also take a look at the in-progress Python code to deal with this format.  It's pretty trivial to write and parse.  I imagine that dealing with parsing additional algorithms will be slightly more complicated, but nothing compared with the complication of actually implementing said algorithms.

(All of this still of course relies on X.509 certificates underlying the SSL connections used to retrieve the user-specific public keys.  That's fine, because those certificates are hidden underneath existing libraries and don't need to be visible to Salmon code at all.)

This is a general purpose, lightweight discovery mechanism for personal signing keys.  It works well for Salmon; once widely deployed, it would be useful for other purposes as well.

(Edited to update thoughts on Subject, which, I think, is not needed.)