Lattice asked me about resources for representing a mailing list as an RSS feed. Particularly, he was wanting to put together an alternative(post-email) interface to a Sympa mailing list, with an RSS aggregator for reading the list traffic, and the Sympa web interface for posting. I didn’t have any suggestions for Sympa, but noted that the script mmrss turns Mailman lists into RSS feeds.

A Thought

This inspired me to think about tackling the chronic lack of mail list archiving soltuions as multi-step problem. Perhaps if we could get the mailing lists into a suitable inbetween format, with all the assumptions exposed, and codified into XML, then perhaps the archivers simply focus on giving a decent presentation of the complex interactions of a mailing list. And having that format be compatible with the latest crop of desktop clients would allow people to build new and exciting ways of interacting with the lists.

So I compiled an overview of the problems, the challeges, and work that has gone before on building an RSS threading standard.

A Quick Review of the State of the Archiver

MHonarc is something of a standard, but I can’t standard the archives it generates, it should be possible to generate something attractive and usable, but I’m still waiting. Current otherings run from the pedestrian Mail Archive to byzantine Sympa style)

Mailman’s archives ( pipermail) are pleasingly straightforward and clean, however their threading algorithm is a little weak, the archives are fragile with slight changes to the mbox changing URLs, and rebuilding the archives for very active, old lists can be incredibly slow. This and pipermail has no clue what to do with attachments.

Zest is an intriguing alternative I’ve mentioned before, however people I’ve shown it to find it confusing, and while I think they could learn, I hesitate to recommend it as a drop in replacement for MHonarc/Pipermail.

The Problem with MMRSS

Unfortunately mmrss doesn’t solve our problems. It scrapes the existing Mailman archives, and creates very simple RSS 0.91. Because RSS 0.91 isn’t very flexible, there is no way to include most of the interesting information, including the basic meta-data like creator (From:), and date sent (Date:) as well as more email specific info (like attachments, user agent, and message id), and the messages threading information (In-Reply-To:)

This means that while MMRSS could be useful for watching a list (ie. getting notified when it updates) it does not provide a meaningful alternative to reading the list.(this could partially be due to the script started life to generate RSS feeds for FoRK, an email based proto-blog with an interesting role in the history of RSS/Email convergence)

What would we want?

On the <channel> definition we’ll want the usual suspect, including:

  • the name of the mailing list in the <title>
  • the URL of either the lists webpage if it has one (as all Mailman and Sympa lists do), or the link to the web accessible archives (if for some reason you are using something else, and haven’t setup a webpage)
  • the date of the most recent message to the list in <dc:date>
  • and as much other meta-data as possible including description, and language.

And we’ll want: - the list, and list admin addresses, perhaps in <dc:creator> and <dc:publisher>? or would that be an inappropriate re-use?

On a particular <item> we’ll want:

  • the URL to the web archived email
  • the subject of the email in the <title>
  • the contents of the From: field in <dc:creator> (or one of the sha1/foaf based email obscuring technologies being discussed on rss-dev)
  • the date the email was sent (or received by the mailing list software) in <dc:date>
  • the full content of the message in the <description> (if you’re serious about providing an alternative interface to the list)

Ideally we would also have:

  • information about the messages membership in a thread (see below)
  • links to web archives of any attachments that might have been included in the email
  • a link (or mail address) to reply to this message particular message

There might also be reasons to include:

  • Message-ID (perhaps for constructing replies)
  • User-Agent
  • Spam Status (or similar Spam flagging header)
  • CC information

Some of this definitely can’t be shoe horned into the 3 standard RSS 1.0 modules (Dublin Core, Syndication, and Content), and demand extensions, but perhaps a proposed module (or modules) will do. ### Examining the Prior Art in Threading

Not surprisingly, displaying email as RSS, displaying mailing lists as RDF, and building interchange formats for threaded discussion in RSS have all been discussed before. No one change up with exactly the same feature set (surprise!) because everyone had slightly different conceptions of the problem.

The first important insight when considering an implementation (and examining prior art) is to realize that much of what is important to representing a mailing list is important to representing any form of threaded discussion, like the comments from a message board or blog, or the posts from a newsgroup.

There is the proposed RSS threading module, that adds one tag: <child>. Which is nice and simple, but also kind of awkward as email tends to be linked into threads by referring to a parent.(In-Reply-To:)

ThreadML was an interesting initiative of Steve Yost of Quicktopic, partially created in a response to this article on Joho. It was a standard composed of RSS 1.0, modcontent, modthreading to represent parent-to-child, and mod_annotation to represent child-to-parent relations. There was a very active quicktopic discussing ThreadML for a while, but it seems to have gone quiet. An example Quicktopic RSS feed to see how it might have looked.

Discussion of creating an RSS feed for the W3C’s mailing lists, prompted the proposal of a mod_email which might be useful, but doesn’t seem to focus on the commonality between different mediums enough for me.

The PHP mailing lists are available as RSS feeds, unfortunately here again, they aren’t very useful RSS feeds, with the From information stuck into a <mailto:> tag in the <description> tag. Odd.

Yahoogroups (formerly eGroups) provides RSS feeds for the lists (e.g. RSS-DEV[18] they host, see the original announcement on FoRK)

Mail-archive.com makes a simple RSS 0.9 feed available for each list, for example mod_perl’s RSS

Lastly, the Thread Description Language is an interesting attempt to build a rich RDF syntax for talking about all sorts of different threads. Some of its concepts like agreesWith and disagreesWith would be very cool to add to a RSS/RDF feed based on Zest and its inline mark up.

Conclusion, and Concerns

All of that is very interesting, but I don’t feel like any of the above directly maps to the features I mentioned above, it might be possible to assemble something out of the pieces, but a few items (like the url/mailto to respond to a post) is totally missing from any of this.

It might be worth looking at an email<->NTTP gateway to see what tricks they play, and what they consider necessary.

One problem with marking up email in XML is that it makes it very very easy for spammers to identify email addresses. There is a thread on RSS-DEV about ways to combat this. It seems like it should be possible with the collusion of the list archiver to send out “privatized” emails, where contact information is replaced with some sort of smart URI to confuse harvesters without interrupting communication. Yahoogroups kind of does this.

It would cut down on the complexity immensely to just skip the whole threading thing, but I think threading is an important feature for facilitating a culture of discussion, often discourages in spaces that use “linear threading” (like the traditional display of an email client’s inbox) Might be suitable for a version 1.0

Yet another approach would be to get away from the concept of an individual post, and syndicate conversations (threads). This would map very well to Zest’s concept of threads, but would work pretty well for normal archives and threads as well. The one trick would be figuring out distribute threads in such as way as to be useful with the most recent post readily accessible.

Related Posts: