2007/01/11

Generic Atom-to-JSON Conversion

Yesterday, our feeds infrastructure team released a bunch of new code.  There's actually a lot that this service can do; one of the cooler things it now does is to convert arbitrary Atom or RSS feeds into cross-domain-retrievable JSON data structures:

curl -v 'http://headlines.favorites.aol.com/hlserver/api/GetFeed.do?
url=http://journals.aol.com/panzerjohn/abstractioneer/atom.xml&format=atom-json&callback=cb'

which returns, in part, the data structure below.  The big win is the ability for any web page to retrieve feed data from any feed source without needing to set up a custom proxy.  Obviously, there's still a proxy involved here, and it's one we're running; it does both feed format normalization and caching and is highly scalable.  I hope we can turn this into a supported, documented API on dev.aol.com soon.

The output looks like this:

cb(
{
"feed" :  
  {
  "aj:accessType":
      {"xmlns:aj":"http://journals.aol.com/_atom/aj#","content":"public"},
  "xmlns:sy" : "http://purl.org/rss/1.0/modules/syndication/",
  "aj:blogShortName":
      {"xmlns:aj":"http://journals.aol.com/_atom/aj#","content":"abstractioneer"},
  "title":"Abstractioneer",
...
  "entry": [
    "title":"Why AOL Should Go OpenID",
    "published":"2006-12-15T23:15:48Z",
...

});


There are a few oddities in the output -- atom:author gets mapped to dc:creator, for example -- which I'll find out about tomorrow.

2 comments:

  1. A few other issues worth investigating:

    look at feed.title, feed.entry[0].author, and feed.entry[0].content in the following:

    http://headlines.favorites.aol.com/hlserver/api/GetFeed.do?url=http://tantek.com/updates.atom&format=atom-json

    look at feed.entry[0].summary.type in the following:

    http://headlines.favorites.aol.com/hlserver/api/GetFeed.do?url=http://scripting.com/rss.xml&format=atom-json

    Explanation:

    title seems to presume text/plain, no matter what the declared type

    inserted entry authors are null, not inherited

    is this how you want to handle xhtml?

    summary type indicated is HTML, I expected html



    ReplyDelete
  2. Yep.  I'm not sure how much of this is due to the underlying ROME library that's used for processing and normalizing feeds, and how much to the server code.

    I'm also not quite sure what the rules for namespaced elements are; certainly the way that custom namespaces are handled seems to incur a fairly big expansion in size compared to XML (hard to believe).

    It'd be good to figure out best practices for these sorts of things.

    ReplyDelete

Suspended by the Baby Boss at Twitter

Well!  I'm now suspended from Twitter for stating that Elon's jet was in London recently.  (It was flying in the air to Qatar at the...