Skip to main content

A Web Wide Public Key Infrastructure

In my last post, I talked about Magic Signatures, the evolution of Salmon's Magic Security Pixie Dust into something concrete.  There's an important bit missing, which I'll talk about now:  Public key signatures are fairly useless without a way to discover and use other people's public keys in a secure, reliable way.  We need a distributed, easily deployable public key infrastructure.  Let's build one.

The basic plan for Salmon signature verification is fairly simple:  Use Webfinger discovery on the author of the Salmon to find their Magic Signature public key, if they have one.  To find this, you first get the Webfinger XRD file, then look for a "magickey" link:
<Link rel="" href="" />
Retrieve the magickey resource, parse it into a public key in your favorite library, and use that to verify the signature of the original Salmon you started with.

A few notes:
  • Retrievals of these documents should be done via TLS and certificates checked; if any step in the chain isn't done via checked TLS, the result may be vulnerable to MITM or DNS poisoning attacks.
  • The public key is per-user and is effectively self-signed.  While I anticipate most keys will be generated by large providers, it is possible for users to generate their own public/private keypairs and upload only the public key to the Web.
  • Maintaining data about expired and revoked keys will be needed too, but first things first.

The next question is what format that public key should be.  The obvious choice is an X.509 PKI certificate PEM file, but this turns out to pull in a ton of baggage; you can't even parse one of these using available App Engine libraries at the moment.  This is primarily due to the use of ASN.1 DER encoding, which is usually dealt with via a linked-in C library.  X.509 is also inherently complex and overkill for what we need; it's based on a hierarchical model of CAs which doesn't map well to Salmon's decentralized model.

When you dig down into things, it turns out that an RSA public key itself is fairly simple.  You have two numbers, a modulus (n, for some reason) and an exponent (e).  You need to serialize and deserialize these numbers in a standard way.  Most libraries let you pass in and access these values.  The only catch is that the numbers themselves can be very, very big -- bigger than any native number on any system, so you need something like a bignum library to deal with these numbers.  At the end of the day, though, you're just dealing with a pair of numbers.  You don't need complicated formats for that.

So, here's a very simple way to represent this in a web-safe way:
"RSA." + urlsafe_b64_encode(to_network_bytes(modulus)) + "." + urlsafe_b64_encode(to_network_bytes(exponent))
The output is two base64 encoded strings, separated by a period, and prefixed with the key type:
To parse this, you split on the period, urlsafe_b64_decode each piece into bytes, and then turn the bytes into bignums using your library.  (I'm using the PyCrypto bytes_to_long and long_to_bytes functions to play around with this at the moment.)

(There might possibly also be a need for a Subject attached to the key itself, to prevent Joe from pointing at Bob's key and claiming it as his own.  Joe still couldn't sign things as Bob but he could potentially claim Bob's output as his own.  So, add the base64 encoding of Joe's Webfinger ID to the public key.  It's not clear to me if this is needed quite yet.)

You may also want to store and retrieve private keys; it turns out the private keys just need one more value, d, along with n and e.  Append this to the end as an optional parameter.  Thoughts?  Anything missing?  You can also take a look at the in-progress Python code to deal with this format.  It's pretty trivial to write and parse.  I imagine that dealing with parsing additional algorithms will be slightly more complicated, but nothing compared with the complication of actually implementing said algorithms.

(All of this still of course relies on X.509 certificates underlying the SSL connections used to retrieve the user-specific public keys.  That's fine, because those certificates are hidden underneath existing libraries and don't need to be visible to Salmon code at all.)

This is a general purpose, lightweight discovery mechanism for personal signing keys.  It works well for Salmon; once widely deployed, it would be useful for other purposes as well.

(Edited to update thoughts on Subject, which, I think, is not needed.)

Popular posts from this blog

Personal Web Discovery (aka Webfinger)

There's a particular discovery problem for open and distributed protocols such as OpenID, OAuth, Portable Contacts, Activity Streams, and OpenSocial.  It seems like a trivial problem, but it's one of the stumbling blocks that slows mass adoption.  We need to fix it.  So first, I'm going to name it:

The Personal Web Discovery Problem:  Given a person, how do I find out what services that person uses?
This does sound trivial, doesn't it?  And it is easy as long as you're service-centric; if you're building on top of social network X, there is no discovery problem, or at least only a trivial one that can be solved with proprietary APIs.  But what if you want to build on top of X,Y, and Z?  Well, you write code to make the user log in to each one so you can call those proprietary APIs... which means the user has to tell you their identity (and probably password) on each one... and the user has already clicked the Back button because this is complicated and annoying.

XAuth is a Lot Like Democracy

XAuth is a lot like democracy:  The worst form of user identity prefs, except for all those others that have been tried (apologies to Churchill).  I've just read Eran's rather overblown "XAuth - a Terrible, Horrible, No Good, Very Bad Idea", and I see that the same objections are being tossed around; I'm going to rebut them here to save time in the future.

Let's take this from the top.  XAuth is a proposal to let browsers remember that sites have registered themselves as a user's identity provider and let other sites know if the user has a session at that site.  In other words, it has the same information as proprietary solutions that already exist, except that it works across multiple identity providers.  It means that when you go to a new website, it doesn't have to ask you what your preferred services are, it can just look them up.  Note that this only tells the site that you have an account with Google or Yahoo or Facebook or Twitter, not what the…
Twister is interesting.  It's a decentralized "microblogging" system based on putting together existing protocols:  Bitcoin, distributed hash tables, and Bittorrent.  The most interesting part for me is using Bitcoin for user registration and spam control.  Federated systems handle this with federated trust, which is at least conceptually simple.  The Twister/Bitcoin mechanism looks intriguing though I don't know enough about Bitcoin to really comment.  Need to read further.