Blog posts tagged "state.of.the.art"

On Book Listing Services

November 6th, 2005

For years I’ve wanted a decent website where I can manage my relationship with books. (not especially complicated, but voluminous)

For a while there was largely nothing, then there was Allconsuming which was wonderful, but slowly died, and went dark before being re-incarnated in the mold of a 43x tool. And I have this memory of there being a nifty little $14/mo tool, back in the days when I didn’t pay for websites, but I wasn’t able to find it.

Last Fall, I started sketching down notes towards building my own, and in the intervening year its become an interestingly crowded space. (who knew so many other people felt the pull) Even in the 6 weeks since I first started jotting down sites for this blog post, the space has evolved with LibraryThing coming out solidly on top as the most active: most actively developed, most actively used, and most actively engaged developer.

That said, in a cursory search (mostly of my del.icio.us links) I turned up 5 other very similar services

Also the Bookshelf example app from 24L, and the intersting related services What Should I Read Next?, and Library Elf

None of them are quite there yet, and I want more, more, more!

Read the rest of this entry »

Once more into the Breach: Calendars, Events, and RSS

September 22nd, 2003

The launch of Upcoming.org seems to have rekindled some interest in calendaring standards, stoked by a post from Ray Ozzie, and the Calendar Fiasco, by Jon Udell. (also see eric’s collection of upcoming.org links)

Read the rest of this entry »

RSSifying the Mailing List

November 25th, 2002

Lattice asked me about resources for representing a mailing list as an RSS feed. Particularly, he was wanting to put together an alternative(post-email) interface to a Sympa mailing list, with an RSS aggregator for reading the list traffic, and the Sympa web interface for posting. I didn’t have any suggestions for Sympa, but noted that the script mmrss turns Mailman lists into RSS feeds.

A Thought

This inspired me to think about tackling the chronic lack of mail list archiving soltuions as multi-step problem. Perhaps if we could get the mailing lists into a suitable inbetween format, with all the assumptions exposed, and codified into XML, then perhaps the archivers simply focus on giving a decent presentation of the complex interactions of a mailing list. And having that format be compatible with the latest crop of desktop clients would allow people to build new and exciting ways of interacting with the lists.

So I compiled an overview of the problems, the challeges, and work that has gone before on building an RSS threading standard.

A Quick Review of the State of the Archiver

MHonarc is something of a standard, but I can’t standard the archives it generates, it should be possible to generate something attractive and usable, but I’m still waiting. Current otherings run from the pedestrian Mail Archive to byzantine Sympa style)

Mailman’s archives ( pipermail) are pleasingly straightforward and clean, however their threading algorithm is a little weak, the archives are fragile with slight changes to the mbox changing URLs, and rebuilding the archives for very active, old lists can be incredibly slow. This and pipermail has no clue what to do with attachments.

Zest is an intriguing alternative I’ve mentioned before, however people I’ve shown it to find it confusing, and while I think they could learn, I hesitate to recommend it as a drop in replacement for MHonarc/Pipermail.

The Problem with MMRSS

Unfortunately mmrss doesn’t solve our problems. It scrapes the existing Mailman archives, and creates very simple RSS 0.91. Because RSS 0.91 isn’t very flexible, there is no way to include most of the interesting information, including the basic meta-data like creator (From:), and date sent (Date:) as well as more email specific info (like attachments, user agent, and message id), and the messages threading information (In-Reply-To:)

This means that while MMRSS could be useful for watching a list (ie. getting notified when it updates) it does not provide a meaningful alternative to reading the list.(this could partially be due to the script started life to generate RSS feeds for FoRK, an email based proto-blog with an interesting role in the history of RSS/Email convergence)

What would we want?

On the <channel> definition we’ll want the usual suspect, including:

  • the name of the mailing list in the <title>
  • the URL of either the lists webpage if it has one (as all Mailman and Sympa lists do), or the link to the web accessible archives (if for some reason you are using something else, and haven’t setup a webpage)
  • the date of the most recent message to the list in <dc:date>
  • and as much other meta-data as possible including description, and language.
And we’ll want:
  • the list, and list admin addresses, perhaps in <dc:creator> and <dc:publisher>? or would that be an inappropriate re-use?

On a particular <item> we’ll want:

  • the URL to the web archived email
  • the subject of the email in the <title>
  • the contents of the From: field in <dc:creator> (or one of the sha1/foaf based email obscuring technologies being discussed on rss-dev)
  • the date the email was sent (or received by the mailing list software) in <dc:date>
  • the full content of the message in the <description> (if you’re serious about providing an alternative interface to the list)

Ideally we would also have:

  • information about the messages membership in a thread (see below)
  • links to web archives of any attachments that might have been included in the email
  • a link (or mail address) to reply to this message particular message

There might also be reasons to include:

  • Message-ID (perhaps for constructing replies)
  • User-Agent
  • Spam Status (or similar Spam flagging header)
  • CC information
Some of this definitely can’t be shoe horned into the 3 standard RSS 1.0 modules (Dublin Core, Syndication, and Content), and demand extensions, but perhaps a proposed module (or modules) will do.

Examining the Prior Art in Threading

Not surprisingly, displaying email as RSS, displaying mailing lists as RDF, and building interchange formats for threaded discussion in RSS have all been discussed before. No one change up with exactly the same feature set (surprise!) because everyone had slightly different conceptions of the problem.

The first important insight when considering an implementation (and examining prior art) is to realize that much of what is important to representing a mailing list is important to representing any form of threaded discussion, like the comments from a message board or blog, or the posts from a newsgroup.

There is the proposed RSS threading module, that adds one tag: <child>. Which is nice and simple, but also kind of awkward as email tends to be linked into threads by referring to a parent.(In-Reply-To:)

ThreadML was an interesting initiative of Steve Yost of Quicktopic, partially created in a response to this article on Joho. It was a standard composed of RSS 1.0, modcontent, modthreading to represent parent-to-child, and mod_annotation to represent child-to-parent relations. There was a very active quicktopic discussing ThreadML for a while, but it seems to have gone quiet. An example Quicktopic RSS feed to see how it might have looked.

Discussion of creating an RSS feed for the W3C’s mailing lists, prompted the proposal of a mod_email which might be useful, but doesn’t seem to focus on the commonality between different mediums enough for me.

The PHP mailing lists are available as RSS feeds, unfortunately here again, they aren’t very useful RSS feeds, with the From information stuck into a <mailto:> tag in the <description> tag. Odd.

Yahoogroups (formerly eGroups) provides RSS feeds for the lists (e.g. RSS-DEV[18] they host, see the original announcement on FoRK)

Mail-archive.com makes a simple RSS 0.9 feed available for each list, for example mod_perl’s RSS

Lastly, the Thread Description Language is an interesting attempt to build a rich RDF syntax for talking about all sorts of different threads. Some of its concepts like agreesWith and disagreesWith would be very cool to add to a RSS/RDF feed based on Zest and its inline mark up.

Conclusion, and Concerns

All of that is very interesting, but I don’t feel like any of the above directly maps to the features I mentioned above, it might be possible to assemble something out of the pieces, but a few items (like the url/mailto to respond to a post) is totally missing from any of this.

It might be worth looking at an email<->NTTP gateway to see what tricks they play, and what they consider necessary.

One problem with marking up email in XML is that it makes it very very easy for spammers to identify email addresses. There is a thread on RSS-DEV about ways to combat this. It seems like it should be possible with the collusion of the list archiver to send out “privatized” emails, where contact information is replaced with some sort of smart URI to confuse harvesters without interrupting communication. Yahoogroups kind of does this.

It would cut down on the complexity immensely to just skip the whole threading thing, but I think threading is an important feature for facilitating a culture of discussion, often discourages in spaces that use “linear threading” (like the traditional display of an email client’s inbox) Might be suitable for a version 1.0

Yet another approach would be to get away from the concept of an individual post, and syndicate conversations (threads). This would map very well to Zest’s concept of threads, but would work pretty well for normal archives and threads as well. The one trick would be figuring out distribute threads in such as way as to be useful with the most recent post readily accessible.

Related Posts:

Web Development with Perl.

September 10th, 2002

In preparation for doing the Protest.net re-write I’m doing some research on web frameworks. I find myself rewriting the same central dispatcher code, adding refinements, like a redirect method (similiar to the forward() method in servlets) and in general missing some of the refinements of Java’s web environments (servlets, Struts, bundles for i18n) while refusing to give up Perl, CPAN, and Template Toolkit.

This is my brain dump so far. Lots of questions, a few answers. The next step will be firing up the text editor and looking under the hood.

Steal from the Best

Applications I’m going to examine for ideas to steal, particularily framework ideas (error handling, inner loop, etc.)

  • Moveable Type – in my few glaces at the code its seemed intelligent, and well structured. Runs under mod_perl and CGI, handles some complex UI demands like pages that use redirects to update progress bars, and re-entrant forms.
  • Bricolage – based on Mason, I’ve been hearing good things about this CMS, particular about its clever Burners abstraction, hope to find other good things under the hood. Don’t really know much about Mason, not sure if that is going to be barrier
  • RT – popular, mod_perl and Mason, ticket tracking software. Also supports email, and command line interfaces.
  • Scoop – haven’t looked at Scoop in a while, but I remember it seemed well done (having just come from looking at Slashcode 1.0), I wonder if it will stand the test of time. Not expecting to find anything all that relevant, but maybe. Will probably do a quick skim of the CMS parade: Slash, Everything2, and maybe LiveJournal. Definitely want to look at LiveJournal’s embedding instructions.
  • ?? Do you any Perl web applications you would consider examples of best practices??

Perl Frameworks

  • OpenInteract seems popular, and uses TT2, but leaves me cold, partially because I’m just not crazy about persistence layers. But I should probably spend some time looking at it. ?? Any expirences using it ??
  • Mason natively uses an embedded, page based execution model, which seems PHP-like to me, but Design Issues with Mason: What are Components For? seems to be simple guidelines for avoiding the siren calls. There is an Oreilly book coming out on it, which will be available online, but the author says, don’t expect it until mid-October.
  • Wombat is billed as an implementation of the Servlet API in Perl, but I can find exactly zero reference to anyone using it, never a good sign.
  • AxKit is a web framework built along the lines of Cocoon. I bet there are seriously cool ideas to steal in there, but I’ve been burned pretty badly playing with XSLT and don’t want to go back into that water right now.
  • OpenFrame also seems interesting. It has the request and response objects ala servlets, and acme is being paid to work on it, and it has a “Slot” concept which seems similiar to Brico’s Burner concept, and supports TT2. However it bills itself as an “open source application framework for distributed media applications.” Which isn’t really what we are building.
  • A few others are listed at Application Servers and Toolkits based on modperl. Also, Web Frameworks and their Template Engines is interesting.
  • Last time I was looking at this stuff there was JellyBean, which no longer seems maintained, and Iadio, which no longer exists.
  • I found a simple home-rolled framework that feels simliar to the one I did for Rockwood. It was mentioned in this interesting thread.
  • The book, Perl for the Web from New Riders, is freely available online and seems interesting if out of date.

Piecemealing with Modules

UPDATE: After looking at the available options, and at Struts 1.1, I would go with Java if I didn’t know I would have an insurrection on my hands.

Tagged: Uncategorized , , , , , ,