Lattice asked me about resources for representing a mailing list as an RSS
feed. Particularly, he was wanting to put together an alternative(post-email)
interface to a
Sympa mailing list, with an RSS aggregator for reading the list
traffic, and the Sympa web interface for posting. I didn’t have any
suggestions for Sympa, but noted that the script
mmrss turns Mailman
lists into RSS feeds.
A Thought
This inspired me to think about tackling the chronic lack of mail list archiving soltuions as multi-step problem. Perhaps if we could get the mailing lists into a suitable inbetween format, with all the assumptions exposed, and codified into XML, then perhaps the archivers simply focus on giving a decent presentation of the complex interactions of a mailing list. And having that format be compatible with the latest crop of desktop clients would allow people to build new and exciting ways of interacting with the lists.
So I compiled an overview of the problems, the challeges, and work that has gone before on building an RSS threading standard.
A Quick Review of the State of the Archiver
MHonarc is something of a standard, but I can’t standard the archives it
generates, it should be possible to generate something attractive and usable,
but I’m still waiting. Current otherings run from the pedestrian
Mail Archive to
byzantine Sympa style)
Mailman’s archives (
pipermail) are pleasingly
straightforward and clean, however their
threading algorithm is a little weak, the archives are fragile with slight
changes to the mbox changing URLs, and rebuilding the archives for very active,
old lists can be incredibly slow. This and pipermail has no clue what to do
with attachments.
Zest is an intriguing alternative I’ve mentioned before, however people I’ve
shown it to find it confusing, and while I think they could learn, I hesitate to
recommend it as a drop in replacement for MHonarc/Pipermail.
The Problem with MMRSS
Unfortunately mmrss doesn’t solve our problems. It scrapes the existing
Mailman archives, and creates
very simple RSS 0.91. Because RSS 0.91 isn’t
very flexible, there is no way to include most of the interesting information,
including the basic meta-data like creator (From:
), and date
sent (Date:
) as
well as more email specific info (like attachments, user agent, and message
id), and the messages threading information (In-Reply-To:
)
This means that while MMRSS could be useful for watching a list (ie. getting
notified when it updates) it does not provide a meaningful alternative to
reading the list.(this could partially be due to the script started life to
generate RSS feeds for
FoRK, an email based proto-blog with an interesting role
in the history of RSS/Email convergence)
What would we want?
On the <channel> definition we’ll want the usual suspect, including:
- the name of the mailing list in the <title>
- the URL of either the lists webpage if it has one (as all Mailman and
Sympa lists do), or the link to the web accessible archives (if for some reason
you are using something else, and haven’t setup a webpage)
- the date of the most recent message to the list in <dc:date>
- and as much other meta-data as possible including description, and language.
And we’ll want:
- the list, and list admin addresses, perhaps in <dc:creator> and <dc:publisher>?
or would that be an inappropriate re-use?
On a particular <item> we’ll want:
- the URL to the web archived email
- the subject of the email in the <title>
- the contents of the From: field in <dc:creator> (or one of the
sha1/foaf based
email obscuring technologies being discussed on rss-dev)
- the date the email was sent (or received by the mailing list software) in
<dc:date>
- the full content of the message in the <description> (if you’re serious about
providing an alternative interface to the list)
Ideally we would also have:
- information about the messages membership in a thread (see below)
- links to web archives of any attachments that might have been included in the
email
- a link (or mail address) to reply to this message particular message
There might also be reasons to include:
- Message-ID (perhaps for constructing replies)
- User-Agent
- Spam Status (or similar Spam flagging header)
- CC information
Some of this definitely can’t be shoe horned into the 3 standard RSS 1.0 modules
(Dublin Core, Syndication, and Content), and demand extensions, but perhaps a
proposed module (or modules) will do.
Examining the Prior Art in Threading
Not surprisingly, displaying email as RSS, displaying mailing lists as RDF, and
building interchange formats for threaded discussion in RSS have all been
discussed before. No one change up with exactly the same feature set
(surprise!) because everyone had slightly different conceptions of the problem.
The first important insight when considering an implementation (and examining
prior art) is to realize that much of what is important to representing a
mailing list is important to representing any form of threaded discussion, like
the comments from a message board or blog, or the posts from a newsgroup.
There is the proposed
RSS threading module, that adds one tag: <child>.
Which is nice and simple, but also kind of awkward as email tends to be linked
into threads by referring to a parent.(In-Reply-To:
)
ThreadML
was an interesting initiative of
Steve Yost of
Quicktopic,
partially created in a response to this article on
Joho.
It was a standard composed of RSS 1.0, modcontent, modthreading to represent
parent-to-child, and mod_annotation to represent child-to-parent relations.
There was a very active
quicktopic
discussing ThreadML for a while, but it seems
to have gone quiet. An example Quicktopic RSS feed to see how it might have
looked.
Discussion of
creating an RSS
feed for the W3C’s mailing lists, prompted the
proposal of a
mod_email which might be useful, but doesn’t seem to focus on
the commonality between different mediums enough for me.
The PHP mailing lists are available
as RSS feeds, unfortunately here again, they
aren’t very useful RSS feeds, with the From information stuck into a <mailto:>
tag in the <description> tag. Odd.
Yahoogroups (formerly eGroups) provides RSS feeds for the lists (e.g.
RSS-DEV[18] they host, see the
original announcement on FoRK)
Mail-archive.com makes a simple RSS 0.9 feed available for each list, for
example mod_perl’s
RSS
Lastly, the
Thread Description
Language is an interesting attempt to build a
rich RDF syntax for talking about all sorts of different threads. Some of its
concepts like agreesWith and disagreesWith would be very cool to add to a
RSS/RDF feed based on Zest and its inline mark up.
Conclusion, and Concerns
All of that is very interesting, but I don’t feel like any of the above directly
maps to the features I mentioned above, it might be possible to assemble
something out of the pieces, but a few items (like the url/mailto to respond to
a post) is totally missing from any of this.
It might be worth looking at an email<->NTTP gateway to see what tricks they
play, and what they consider necessary.
One problem with marking up email in XML is that it makes it very very easy for
spammers to identify email addresses. There is a thread on RSS-DEV about ways
to combat this. It seems like it should be possible with the collusion of the
list archiver to send out “privatized” emails, where contact information is
replaced with some sort of smart URI to confuse harvesters without interrupting
communication. Yahoogroups kind of does this.
It would cut down on the complexity immensely to just skip the whole threading
thing, but I think threading is an important feature for facilitating a culture
of discussion, often discourages in spaces that use “linear threading” (like the
traditional display of an email client’s inbox) Might be suitable for a version
1.0
Yet another approach would be to get away from the concept of an individual
post, and syndicate conversations (threads). This would map very well to
Zest’s concept of threads, but would work pretty well for normal archives and
threads as well. The one trick would be figuring out distribute threads in such
as way as to be useful with the most recent post readily accessible.
Related Posts: