Abstractioneer by John Panzer: Generic Atom-to-JSON Conversion

2007/01/11

Generic Atom-to-JSON Conversion

Yesterday, our feeds infrastructure team released a bunch of new code. There's actually a lot that this service can do; one of the cooler things it now does is to convert arbitrary Atom or RSS feeds into cross-domain-retrievable JSON data structures:

curl -v 'http://headlines.favorites.aol.com/hlserver/api/GetFeed.do?
url=http://journals.aol.com/panzerjohn/abstractioneer/atom.xml&format=atom-json&callback=cb'

which returns, in part, the data structure below. The big win is the ability for any web page to retrieve feed data from any feed source without needing to set up a custom proxy. Obviously, there's still a proxy involved here, and it's one we're running; it does both feed format normalization and caching and is highly scalable. I hope we can turn this into a supported, documented API on dev.aol.com soon.

The output looks like this:

cb(
{
"feed" :
{
"aj:accessType":
      {"xmlns:aj":"http://journals.aol.com/_atom/aj#","content":"public"},
"xmlns:sy" : "http://purl.org/rss/1.0/modules/syndication/",
"aj:blogShortName":
      {"xmlns:aj":"http://journals.aol.com/_atom/aj#","content":"abstractioneer"},
"title":"Abstractioneer",
...
"entry": [
    "title":"Why AOL Should Go OpenID",
    "published":"2006-12-15T23:15:48Z",
...

});

There are a few oddities in the output -- atom:author gets mapped to dc:creator, for example -- which I'll find out about tomorrow.

Tags: json, atom, rss, feeds, mashups, apis, web services, infrastructure

2 comments:

sa3rubyJanuary 12, 2007 at 4:49 AM
A few other issues worth investigating:

look at feed.title, feed.entry[0].author, and feed.entry[0].content in the following:

http://headlines.favorites.aol.com/hlserver/api/GetFeed.do?url=http://tantek.com/updates.atom&format=atom-json

look at feed.entry[0].summary.type in the following:

http://headlines.favorites.aol.com/hlserver/api/GetFeed.do?url=http://scripting.com/rss.xml&format=atom-json

Explanation:

title seems to presume text/plain, no matter what the declared type

inserted entry authors are null, not inherited

is this how you want to handle xhtml?

summary type indicated is HTML, I expected html

ReplyDelete
Replies
panzerjohnJanuary 13, 2007 at 9:23 PM
Yep. I'm not sure how much of this is due to the underlying ROME library that's used for processing and normalizing feeds, and how much to the server code.

I'm also not quite sure what the rules for namespaced elements are; certainly the way that custom namespaces are handled seems to incur a fairly big expansion in size compared to XML (hard to believe).

It'd be good to figure out best practices for these sorts of things.
ReplyDelete
Replies

Add comment

Abstractioneer by John Panzer

2007/01/11

Generic Atom-to-JSON Conversion

2 comments:

Suspended by the Baby Boss at Twitter

Search This Blog