IMC Archives: From Bad to Useless

Contemplating the minor disaster area which is the IMC mail archives from the last few weeks (when we had to turn off Spamassassin as it was killing the server) I had an idea for 2 related scripts using Mail::Folder::Mbox### An Idea

They both revolve around the Mailman archiver’s (pipermail) fragility. Mailman stores archive of a list as an mbox file. Pipermail then crawls that mbox and generates archive urls based on the position of a given message in an mbox. If you want to delete a message, say one containing sensitive information you have to be very very careful to leave a dummy message in place or you break all the URLs pointing to the archive.

The Scripts

  • message<em>delete</em> – Instead of editting the mboxes by hand, pass in message position, or message id, and this replaces the targetted message with a dummy message. I don’t know why I didn’t write this years ago.
  • spamcrawl – Crawls through an mbox, piping messages to Spamassassin, and replacing any message flagged as spam with a dummy message. Doesn’t entirely solve the signal to noise problem as one doesn’t dare actually delete all those spam messages, but if every piece of spam had the same quiet subject line in the archive then one’s brain could relatively quickly learn to skim over them. Might also be possible to play tricks with dates, moving all the spam messages to the beginning or end of a month, but that makes me nervous.

Largely a note to self, as now I need to go to sleep.