apache apachecon apacheroller app apple asf atom atomprotocol atompub barcamprdu blogapps blogging businessblogging conferences family feeds foss general glassfish google humor ibm java javaone links linux mac microsoft movies music netbeans opensocial opensource photos politics rest roller rome rss socialnetworking socialsite socialsoftware sun triangle trianglebloggers vacation webdev webservices wiki
I spent some time exploring the new Rome feed parser for Java and trying to understand how it works. Along the way, I put together the following class diagram and notes on the parsing process. I provide some pointers into the Rome 0.3 Javadocs, but, because this summary is based on the latest Rome codebase from CVS you will notice that some class and interface names have changed.
You don't need to know this stuff to use Rome, but it you are interested in internals you might find it interesting.
Notes on the Rome parsing processRome is based around an idealized and abstract model of a Newsfeed or "Syndication Feed." Rome can parse any format of Newsfeed, including RSS variants and Atom, into this model. Rome can convert from model representation to any of the same Newfeed output formats.
Internally, Rome defines intermediate object models for specific Newsfeed formats, or "Wire Feed" formats, including both Atom and all RSS variants. For each format, there is a separate JDOM based parser class that parses XML into an intermediate model. Rome provides "converters" to convert between the intermediate Wire Feed models and the idealized Syndication Feed model.
Rome makes no attempt at Pilgrim-style liberal XML parsing. If a Newsfeed is not valid XML, then Rome will fail. Perhaps, as Kevin Burton suggests, parsing errors in Newsfeeds can and should be corrected. Kevin suggests that, when the parse fails, you can correct the problem and parse again. (BTW, I have some sample code that shows how to do this, but it only works with Xerces - Crimsom's SAXParserException does not have reliable error line and column numbers.)
Here is what happens during Rome Newsfeed parsing:
URL feedUrl = new URL("file:blogging-roller.rss");
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new InputStreamReader(feedUrl.openStream()));
Rome supports Newsfeed extension modules for all formats that also support modules: RSS 1.0, RSS 2.0, and Atom. Standard modules such as Dublic Core and Syndication are supported and you can define your own custom modules too.
Rome also supports Newsfeed output and for each Newsfeed format provides a "generator" class that can take a Syndication Feed model and produce from it Newsfeed XML.
Learning moreI've linked to a number of the Rome 0.3 Tutorials, here is the full list from the Rome Wiki:
Overall, Rome looks really good. It is obvious that a lot of thought has gone into design and a lot of work has been done on implementation (and docs). Rome is well on the way to "ending syndication feed confusion by supporting all of 'em" for us Java heads.
Please leave a comment if I have gotten something wrong.
This work is licensed under a Creative Commons License.
Copyright 2002-2007, David M Johnson (dave.johnson at rollerweblogger.org)
This is a personal weblog, I do not speak for my employer.

Buy now from Amazon.com
Or direct from Manning
| « August 2004 » | ||||||
| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
1 | 2 | 3 | 6 | |||
9 | 11 | 13 | ||||
15 | 18 | 20 | ||||
24 | 25 | 26 | ||||
30 | ||||||
| Today | ||||||
Allen Gilliland
Anil Gangolli
Dan Axon
Danese Cooper
Film Babble Blog
Geertjan's Weblog
Henri Yandell
James Robertson
Jim Grisanzio
Josh Staiger
Linda Skrocki
Pat Chanezon
Rama
Ruby Sinreich
Simon Phipps
Tim Bray
Will Snow
Janne Jalkanen
Joe Gregorio
Matt Raible
Mike Cannon Brookes
Rafe Colburn
Sam Ruby
Simon Brown
My other sites