Blog posts tagged "design"

RSS-Data, it Keeps Going and Going

October 7th, 2003

There are 2 reasons to use RSS 2.0, one is you really like Dave Winer, and you think it is important to stroke his ego, the other is that you’re scared of RDF. So when Marc says:

Yes – we know that RDF can do many of the things RSS-Data was designed for. But (believe it or not) it really has nothing to do with RSS 1.0 at all. RSS-Data is about extending RSS 2.0. OK? Not RSS 1.0.
I’m kind of surprised.

We obviously can’t be wanting RSS-Data to do just the same thing as RDF otherwise, presumably, one would just use RSS 1.0; after all no one seems to be arguing that RSS-Data does a better job then RDF at the arbitrary structuring of data game. But at some fundamental level, (and having watched the RSS scene for a number of years now I should expect it) I’ve just never really accepted the idea that there are really that many people who care about Dave Winer’s ego. It just makes no sense.

Now don’t get me wrong, I’m not the world’s biggest fan of RDF, I don’t think it magically solves all problems. Partially because I’m not sold on the idea of arbitrary data structures, an approach that seems to me to constantly defer the hard work in favor of the quick prototype. (a point Les explores in his “I got my schema from Amazon, where did your’s comes from?” post) But if you’re going to do it, then at least do it right.

Or as Tom put it so well:

Those Who Don’t Learn RDF Will Be Forced to Reimplement It

Tagged: Uncategorized , ,

RSS-Data: Perhaps Not the Best Idea Ever

October 3rd, 2003

Jeremey Allaire has proposed RSS-Data, an attempt to wedge XML-RPC serialization into RSS, an idea which is so phenomenally bad, and wrong headed, I’m momentarily speechless. (At least of anything I would say in public)

The joy, however, of being behind the curve is someone else has already said it for you.

Whats all that noise?

Les has put together examples of RSS with namespaces vs. RSS-Data; examples that I think speak for themselves (the RSS-Data example is so spaghetti, and loose it gives me the willies just looking at it [or maybe that is my OPML “we’ll just cram everything in all which way, like, in poorly defined attributes that we make up as we go along” flashback kicking in]).

Been Done, Been Done Better, Already Part of RSS

But if they aren’t speaking to you as clearly (and I’ll admit the voices in my head are particularly loud and clear tonight), you might see if Danny’s Um, we’ve had that for 3 years it’s called RDF, and it works much better, makes it any clearer.

Ah, Running Code

So technologically it sucks, but what about this supposed ease of use?
See parsing RSS-Data vs. parsing RSS namespaces, and compare these 2 samples for readability.

The Macromedia Way?

Allaire speaks about a new crop of aggregators that could be extended to solve domain specific problems, and new formats of RSS-Data. All I can say is, namespaces, RDF, and XML…Hello?

He also mentions how wonderful it would be to get all this stuff into Flash, and Java, and that Macromedia Central might be a great container for this stuff, to which I can only say, “What can you expect from the man who gave us ColdFusion!” (literally, this is the same spirit that I think ColdFusion embodies in all it’s hackish, kludgy, kind of uncomfortable with intermediate concepts in computer science, proprietary way)

And I challenge someone to explain to me why adding 3-4 layers of nested XML, to get a poorly specified date format is an improvement over just using W3CDTF?

Tagged: Uncategorized , , ,

A Few Tips for Writing Useful Libraries in PHP

August 5th, 2003

(A month or more ago I started writing a blog entry, which became so long I decided to turn it into an article. However life has gotten away from me, and I don’t know when I’ll get around to doing the clean up to write that article, so here it is.)

Zend has an article Writing Libraries in PHP (you’ll have to trust me on the title, as their CMS seems to be broken). It is a good article as far as it goes, but I think the title over sells the article. Which is a shame, because of all of PHP’s many faults and quirks, perhaps it’s most telling (and most crippling) is a cultural one, PHP programmers write applications, not libraries. I don’t consider myself a PHP guru, being more comfortable in Perl and Java, but I do consider myself a good software developer, and so I’ll try to capture a few of the lesson I learned when developing MagpieRSS, my PHP library for parsing RSS.

First, lets identify the problem.

PHP, its a cultural thang

Python positions itself as distinct from Perl with the slogan “batteries included.” Whether you feel that is accurate is irrelevant, because PHP is the real “batteries included” language. PHP bend so far in this direction, you’ll find functions like pfpro_process for talking to the “Versign Payflow Pro” service in the core language. Besides the patent of absurdity of this, its also created a culture that sees only 2 forms of PHP, core language extensions, and applications.

Why is this a problem?

So whats wrong with that? Code written for a particular application is often very difficult to reuse. Code reuse is one of the holy grails of open source, its how you leverage all the vaunted of benefits of “lots of eyes make bugs shallow”, and patches, and shared development. Code reuse also reduces development time, and bugs in applications that reuse the code. Unless of course to reuse your code I’ve had to dig into it, hacking it to suit my purposes, in which case I’ve probably spent more time, and introduced bugs into code I only half understand. I’ll be tempted to throw it away and start over, splitting time and development energy over multiple solutions rather then improving just one.

So what is the difference between an application and a library?

A little over a year ago, I was wanting to syndicate the events from Protest.net to a website published with PHP. As we generate RSS feeds for our calendars I thought this would be easy. I went looking for a tool to recomend for the website to use, and I came up short. I found many, many, many PHP applications that took an RSS file, and generated HTML, these were applications for displaying RSS as HTML, but they weren’t libraries, they tried to do it all, and therefore couldn’t integrate with this website which had its own way of wanting to use RSS and PHP. A key characteristic of an application versus a library is how many problem it tries to solve; solve too many problems in a single layer and you lose flexibility.

Tips for writing PHP libraries

A few of these tips are theoretical, some are very concrete. Some I’m not thrilled with but they’re are the best solution I have to date. If you disagree, or have suggestions, please add them.

  1. Do one thing, do it well

    You aren’t building a CMS, you aren’t building an interface, you’re trying to support other people in those tasks.

  2. Don’t echo or print content, return it!

    This is one of the key problems you see in much PHP code. If you echo out the results of some function directly to the web page, when I’m trying to use your code to write the output to a file, or run it in a testing environment I’m going to be frustrated. What if I’m trying to build an internationalized app, and you’re echo’ing out English? Don’t assume you know in what context your code will be run, return objects, or strings. Don’t print.

  3. Return data, strings, or objects, not HTML.

    The corollary to the don’t print rule, and abused nearly as often, if not more so, is don’t return formatted HTML. (unless you are writing an HTML widget, in which case that is all you should be doing) If you return the results of your function as a formatted table, it might be pretty and easy to use, but I can’t pass those results to another function to sort them, or integrate it with my pure CSS layout. It is really common to see code like echo('<font color="red">$error_msg</font>');. Don’t do it.

  4. Allow intelligent error handling

    One of the most common reasons a library will print content directly to the web page is on encountering an error. This makes makes it very difficult for the application using your library to figure out what happened and respond accordingly. Don’t assume your code is the most important part of someone’s application, maybe they don’t care if you failed? Or maybe they care a whole bunch, and want to totally change what they were doing?

  5. Use an error() function

    I’m still struggling to come up with the best way to do error handling as a library in PHP. Once PHP5 arrives and we have real exceptions, much of this will be irrelevant. In the meantime.

    What I do with Magpie is provide an error() method for each part of the application.

    error() takes a message, and an optional error level, appends phperrormsg if trackerrors is on, prepends a string identifying the library that is throwing the error, sets up a package/class variable with the resulting error, and, if debugging is on, triggers an error message.

    Why is this good? Code using your library has easy way to check for error conditions (if ($lib->error) { ... do error handling .. } ), error messages are very complete, and consistently formatted, when someone is developing with your app they can easily find out what went wrong based on their php.ini settings, and lastly if someone does need to hack or override your chosen behaviour, they only have to do it once.

  6. Allow simple configuration

    If you don’t provide a way for people to change the behaviour of your library, then you force them to hack on it. If someone has had to go into your code and hack on it, then they’ll be resistant to upgrading as new versions become available, and any changes they make that you want to roll back into the core will be more difficult to apply.

    Configuration can be parameters passed to a class’s constructor, set/get methods called later, or constants defined at a runtime. (more on this later)

  7. Choose intelligent defaults

    This is true with any application, or library in any environment. It is particular true with PHP where a great number of your users aren’t programmers by profession or choice, but just trying to get something working to support their real work.

  8. Break your library into multiple files

    One way to simplify your code, and encourage encapsulation and reuse is to split your library into logically sub pieces, and move these pieces into their own files.

    Something I didn’t do with Magpie, but wish I had was store all the files in a lib/ directory. Having all your files in a single directory makes it much easier for people to install your code. (When it came time to bundle an external library, a modified version of Snoopy, I had learned, and put it in extlib/ )

    This tip is really an excuse for the next tip.

  9. Don’t assume everyone’s PHP looks like yours.

    PHP has a lot of configuration options, runs on dozens of different platforms, and is used in all sorts of different ways. Keep that in mind when writing a library.

  10. If you have multiple files, allow your user to define a base dir. (aka don’t make assumptions #1)

    This is trick from Smarty that was pointed out to me.

    If you have a core library file (e.g. class.inc) that will be including support files don’t make assumptions about the PHP include path on the various machines where your library will be installed.

    For example assume there is a constant MAGPIE_DIR defined then you can include the support libraries with:

    require_once( MAGPIE_DIR . 'rss_parse.inc' );
    require_once( MAGPIE_DIR . 'rss_cache.inc' );
    

    This allows code that uses your library to inform your code about the local environment, rather then forcing your client code into contortion to match your expectation of how a PHP install should work.

    MAGPIEDIR (or YOURLIBRARYDIR) might be set up with code like:

    if (!defined('DIRSEP')) {
        define('DIRSEP', DIRECTORYSEPARATOR);
    }

    if (!defined('MAGPIEDIR')) { define('MAGPIEDIR', dirname(FILE) . DIR_SEP); }

    Which fill set the MAGPIE_DIR to the current directory (useful as ‘.’ isn’t always in the include path) unless you override it with a statement like (for example):

    define('MAGPIE_DIR', '../../magpiefiles');
    
        </p>
        <p>
            (More on using constants for configuration later.) 
        </p>
    </li>
    <li>
        If you're using a semi-obscure PHP extension test that it has been compiled in.  (aka don't make assumptions #2)
        <p>
            This <a href="http://laughingmeme.org/archives/000811.html">bit me hard</a> when developing MagpieRSS.  I add supported HTTP gzip encoding, and suddenly for a small number of users Magpie starting failing.  This was a surprisingly difficult bug to track down, I recomend avoiding it all together.  This is what PHP's <code>function_exists()</code> function is for.
        </p>
        <p>
        In code that uses gzinflate I might add a conditional like
        <pre class="code">
    

    if ( functionexists(‘gzinflate’) ) { …. } Or at the beginning of the Magpie RSS parser I check to make sure PHP has been built with XML support with

    if (!functionexists('xmlparsercreate')) {
     ... trigger error ...
    }
    
  11. Don’t pollute the global namespace

    All functions share a namespace in PHP. What that means is, if I have a parsefile() function in my library, and you have a parsefile() function in your library PHP has no way of telling them apart, and we’ve got a serious problem. Classes help with this.

    Another option is to prefix the functions in your library with a common string. Steve does this with Feed on Feeds, prefixing all his functions with fof, e.g. parsefile() becomes fofparsefile(). Cuts down on conflicts, and increases readability.

  12. If you use database tables allow a table prefix. (aka don’t pollute the other global namespace)

    Most libraries aren’t going to work directly with a database, that is the province of applications usually, but if you are consider allowing the user to configure a table prefix, much like the function prefix from the previous tip. Many users are on low end hosting platforms with only a single MySQL database, this creates another global namespace. If my library has a user table, and your library has a user table, and we have different schemas (almost a guarantee) then we’ve got a problem.

  13. Provide a well designed, object oriented interface to your library, that follows the above rules.

    This is outside the scope of these tips, but see the following corollary.

  14. Provide a functional, PHP-like interface that builds on your OO interface.

    PHP is not a language for building airy, abstract object hierarchies. It is a quick and dirty language for throwing together webpages with a minimum of fuss. While its important to provide the more elegant interface for your advanced users, and to encourage proper design, the majority of your users are going to want something simpler.

    With Magpie I provide an object oriented RSS parsing class. I also provide a simple, rssfetch() one function front-end that is designed to be used directly from within a PHP page.

    My design consideration for rssfetch were fetching remote files is time consuming, and parsing an XML file can be resource intensive. In most languages/environments you would setup a cron script to run in the background handling these tasks, and spitting out HTML fragments. This is not very PHP-like, and often beyond the technical ability of many PHP users. (or beyond what is offered in their hosting environment). So rssfetch transparently uses PHP’s serialize and unserialize to cache the results of the time/resource intensive calls, and serve subsequent calls quickly.

    This makes it very easy for the end users to do the right thing, while writing simple, idiomatic PHP. I think this is very important for writing useful PHP libraries, and is one of the challenges that comes with the territory.

  15. Configuring the functional interface using defines.

    (aside: by functional we means as opposed to object-oriented, not as opposed to non-functional or dysfunctional)

    Choose intelligent defaults, and provide a simple default means our functional interface should “just work” for many people. But when it doesn’t, it should be equally easy to configure without cluttering up the API.

    What I’ve done with Magpie, and its worked well, is extend the technique used in “setting a base dir”.

    Conditionalize your library behaviour on a set of constants (e.g. MAGPIEUSEGZIP, and MAGPIECACHEON). Document these constants. People can now change the behaviour of your library with some simple statements at the top of their PHP, like:

    define('MAGPIECACHEON', false);  // turn off cacheing
    
    Then the first thing your function should do is call an init() function which sets up those intelligent default we talked about:
    function init () {
        if ( defined('MAGPIEINITALIZED') ) {
            return;
        }

    if ( !defined('MAGPIE_CACHE_ON') ) {
        define('MAGPIE_CACHE_ON', true);
    }
    ... other constants ....
    define('MAGPIE_INITALIZED', true);
    

    }

  16. Provide examples

    Always a good idea, but particularly important when distributing PHP libraries. Don’t assume people will read the documentation, or if they do your programmer-esque understanding of your tool will be meaningful to them. What will make sense is example.

  17. Provide examples, carefully

    Be careful what your examples look like because whatever you do in your example is what 90% of your users will do in their scripts.

    Test your examples or your support queue will fill up with people who cut and pasted your code and it isn’t working for them.

    Make your examples as attractive as you can while still being simple, otherwise you’ll be forced to look at your ugly HTML all over the web.

    Show examples that show proper use of your library, including best practices like error handling. If your examples show how to use a feature it will be used, if they don’t, them most people won’t use them.

  18. Document

    Document document document. Provide inline document. Provide a README. Provide a FAQ. Provide a website. Provide hints of where to go looking for more info. Hints like “this class is used by rssfetch() you can stop reading now if you just want to use the simple interface” can also be useful. (of course don’t sound too smug about it, as chances are people are there trying to find one of your bugs)

    Running code is good documentation.

    If you’re going to provide code snippets make them longer then seems necessary as people will often have different instincts about what should come before and after the line in question. This is again the joy and trouble with providing a PHP library.

    I personally like “cookbooks”, a hybrid of a FAQ and running code samples.

  19. Use it Yourself, Use Consistent Names, Plan to Expand

    And just to re-iterate a few of Moran’s suggestions

    Until you really use your library you’ll never know how well you’ve succeeded. When you write a library you shouldn’t conceive of yourself as the only user, but certainly one of the users. Also developing good relationship with people who use your library will cause it to improve dramatically.

    If you name half your methods getRSSFile() and the other half getcache_dir() your users (and yourself in a few months) will be confused, and your code will look messy.

    Moran says, “Never return a single value when you can return an array. Never return an array when you can return an object.”

    I think that is overstating the case, often you’ll want to return a single value for simplicity sake, but take into account when you might not want to. Often returning an object will make the most sense, as long as your clearly document how to use the object.

PHP needs libraries

PHP can be a frustrating language to work in, but its also very rewarding. One of the things I find most rewarding about it is the chance to make an impact, and the easiest way to do that if to release well written, well architected libraries. The community is starting pull together and provide some of this in form of PEAR and PECL, but those projects have a long way to go before they are well documented, easy to use and easy to install. In the meantime you can fill a critical need.

Resisting Temptation, Educating, and Refactoring

Because this lack of libraries is a deep set cultural problem you’ll often find users writing you asking you to add feature X, or feature Y to your library to make it work more like they want their application to work. Remember you’re doing one thing, and doing it well. This is a chance for education. Explain that you’re building a library and the features they’ve asked for are more suited to an application. If you’re feeling like you have the time, or are particularly generous, or if a request comes up over and over consider adding a code sample to your examples to show how to fulfill that particular feature request. Among other things you might find its harder then it should be, and prompt a change in your library.

Thanks to Steve, Martin, Scott, and Evan for their feedback and suggestions.

RSS mod_weather

July 2nd, 2003

While waiting for interminable parade of yuppie to collect their stream of wet/dry/half-caf/whatever espresso drinks from the coffee shop, (come on people! life is too short to order anything other then “coffee, black”) I had an idea, and scribbled a few notes down on a napkin. (I think thats the first time I’ve ever done that napkin thingy)

Its not really a very exciting idea, but what I was thinking about was an RSS namespace for describing weather syndication. Not much use with the current crop of aggregators, but the next gen like Newsmonster, and Shrook give you much greater access to items outside the core. (and then of course there is the radical idea of using RSS for inter-website syndication!)

As I started thinking about it, I decided that weather doesn’t really fit neatly into one namespace, but wants 3 namespaces: weather, forecast, and storm.

I know that seems kind of excessive, and its possible they could be rolled into a single namespace, but I think a nice hybrid spec, that clearly laid out the relationship between the namespaces, and defined some common practice for temperature and such (more on that later) wouldn’t really be too complicated.

And looking at the data weather reports really come in 3 separate types: current conditions, forecasts, and hazardous weather/storm warnings.

weather: Current Conditions

weather: might include:
  • sky – a prose description of current conditions
  • temp – the current temperature
  • humidity – the percent humidity
  • windspeed – wind speed
  • dewpoint – another temperature
  • heatindex – relative heat, another temp
  • windchill – relative cold, another temp.
  • visibility
(Did I forget any?)

It’s Raining Furlongs

One of the first things you notice with weather is people (ahem, the U.S.) like to use their annoying, region specific measurements. Are temperatures in Fahrenheit or Celsius? Is windspeed miles per hour? kilometers per hour? knots?

Sometimes visibility is noted as “10 miles”, other times as “very good”.

To paraphrase Rich Bowen, “The person who came up with [this system] needs to be taken out an beaten with a yardstick”.

There are a number of potentially complex solutions we could come up with, involving sub-elements, or attributes, or what not, but I thought the easiest would be to require measurements of temperature and distance to be marked unambiguously. So valid temps are 32F or 5C, and a valid windspeed is 13MPH.

One nice thing is none of these scales are all that hard to convert between, but if for example you’re going to calculate windchill, you’ll need to make sure you know if you’re working in Farenheit & miles, or celcius and kilometers.

forecast: Is it going to rain on Tuesday?

Forecast will generally be simpler as there is rarely much info available, still any element from current should be valid in forecast. The idea behind having forecast in a separate namespace is two-fold; forecast has a slightly different set of date, and it provides a simple way to determine whether one is talking about now or the future.

Forecast adds to the elements defined in weather

  • period – prose description, “Today”, “Thursday”, “Tuesday Night”
  • date – the day the forecast is for should one be able to represent a range of dates here? often you’ll see something like “Monday Night – Friday, Partly Cloudy”
  • hi – forecasted high/max temperature for a period
  • lo – forecasted lo/min temp for a period

Storm

I haven’t really thought about the Storm namespace yet. The data hear is most radically different then the other two, and I haven’t spent enough time looking at it to determine if there is an underlying set of structured data we can extract or not. But I think something good can be done.

Good idea?

What do you think, seem like a good idea? Sound interesting? Did I miss something obvious?

update [2003/7/2]: Phil pointed out a prototype XML weather service for Medford Country, OR. Very high quality of data. Example output. Too bad this isn’t more widely available.

update [2003/7/3]: Quicktopic discussion on mod_weather

Tagged: Uncategorized , ,

RSS 1.0 is Hard?

June 29th, 2003

You know the one persistent meme in all these RSS debates is that RSS 1.0 is somehow a “RDF technology” and therefore complicated and hard. I never understood that. I’ll be the first to admit I don’t entirely grok the RDF/Semantic Web big picture, and even less the processing model for working with RDF, but so what. RSS 1.0 is just XML, how its hard to parse, or hard to produce I can’t imagine. At one point in the debate RSS 1.0 was supposedly difficult because of the namespaces, and I grant you that supporting namespaces well is harder then not supporting them. But if someone could explain to me what makes RSS 1.0 harder then RSS 2.0 I would dearly love to know.

Tagged: Uncategorized , ,

Weather, RSS, and Thunderstorms

June 22nd, 2003

Tim Bray touched lightly on an idea for a business model I had a while back; that of leveraging RSS’s popularity as a format beyond web syndication. Information like bank services, sales tracking, traffic alerts, and weather.

Its an idea that occurred to me when I first started playing around with delivering events via RSS, and realized that RSS wasn’t an XML file for headlines, but a webservice pipeline I could shove all types of data through.
Sourceforge figured out the same thing, and Technorati has a nice little service (don’t know how “successful” in the business since its been) selling a personalized RSS feed that can watch Google queries, or keep track of who is linking to you.

Tim mentions a few of the same types of feeds I was thinking of, misses a couple, and mentions one I never would have thought of, traffic reports. (being a non-driver and all) Unfortunately/fortunately most of my entrepreneurial flare was burned out of me during the brief years I was running my own little dotcom, and so like a handful of other business models, it sits collecting dust.

Weather

Or at least sort of. I did have a brief go at producing an RSS feed of the weather, and last night, as lightning struck all around, and great thunder claps, pealed and rumbled on and on like bombs detonating, their sound waves rattling my winds, and setting off all the car alarms of the apartment building next door, I revived the project.

Read the rest of this entry »

Tagged: Uncategorized , , ,

RSSifying the Mailing List, an update

February 24th, 2003

Dan Brickley just mentioned a patch to Mailman for producing RSS feeds for a list. While not the ideal feed described in my extended rant on the subject, its an incremental improvement, and much welcome.

Tagged: Uncategorized , , , ,

Computing OCS and mod_syndication upate times

February 11th, 2003

(originally sent as a personal email, but I’m going to post it here as well)

update: got a respone. basically i was attempting to make things too complicated :(

I was playing with the idea of writing an article about RSS modules, which got me thinking about the syndication module, and its relative obscurity.

Its not entirely surprising thats its obscure, weblogs are the RSS playground right now, and in the realm of personal, whimiscal publishing Conditional GETs are probably a much better solution for discovering new content.

However, I also think mod_syndication labours under relative obscurity because it confuses people. And I thought I might do something about that, but now I’ve got a couple of questions about the spec as well.

update: now with answers!

Optionality?

In the OCS format all the fields are optional. In modsyndication it iss ambigous whether updateBase is optional. As updateBase seems to be the most abused field, and adds considerable complexity to calculating update times, it would seem like a good idea to explicit state that is optional, as well as give examples of modsyndication being used without it. (or perhaps it isn’t optional?)

answer: updateBase is optional, but highly reccomended for accuracy

updateBase?

What is the reccomended best practice with updateBase? Do I stick it at some arbitrary point in the past to calculate against? Or should it be set to the most recent publish?

answer: this should stay fixed unless you have an erratic publishing schedule

updateFrequency?

answer (translated): there are no discrete units, everything can be decomposed to seconds

If my updatePeriod is weekly, and my updateFrequency is 2, then does that mean I’m publishing every 3.5 days? (84 hours?)
yes

If for example, I had an updateBase of 2003-02-01T08:00Z, an updatePeriod of Weekly, and an updateFrequency of 2, would this mean, that I’m claiming to publish every Saturday at 8am, and every Wednesday at 8pm?
yes

It probably makes sense to calculate the day change, and then ignore the time change. But it should be defined.
no, days aren’t discrete

For that matter, is twice a month defined at 15 days? or DAY_IN_MONTH/2? On Feb 28th, should I be checking back in 14 days? on 15.5?
DAY_IN_MONTH/2, months are merely composed of days (which are composed of seconds)

It would seem that best practice would dictate using an updateBase for long update frequencies like monthly, or yearly.

Whats a week?

Ignoring all the previous weirdness, if someone has the very simple and reasonable updatePeriod:weekly, with no updateBase, and the assumed updateFrequency of 1. Now is that a week? Or 7 days?

answer: A week is 7 days which is 604800 seconds

For example, its Saturday. Do I come back next Saturday? or do I come back on Monday? or on Sunday?
Saturday

In iCalendar if I ask for a weekly repeat, without specifiying a BYDAY, its going to repeat on Monday, and therefore I should come back 2 days from now.
(note to self, you are a calendar geek)

Either way works, it just needs to be defined.

Code

Do you know of any existing implementations that caculate OCS, or mod_syndication offsets? Code is an excellent way to codify assumptions both explicit and implicity.

update: code is supereasy given the above answers

Tagged: Uncategorized , ,

Service enriched RSS feeds

January 24th, 2003

Dave’s evangelism reminded me of an idea thats been tickling at the back of my brain for a while:

RSS feeds should come packed with a soup of meta-data, and some of that meta-data should be services.

Comment Service

A comment service is the most obvious service one could provide, and most applicable to the majority of deployed RSS feeds. As a bonus we’ve already got a widely deployed API for publishinhg a comment serve: a TrackBack URL.

TrackBack provides a simple, RESTful interface for commenting

  • GET a TrackBack and you have a list of comments,
  • POST to a TrackBack and your part of the flow.
This is not the only service, just as weblogs are not RSS’s only domain, but lets look at how we might include this comment service in an RSS feed.

Deployment Vector?

So how do we describe such a service? I suppose the <comment> tag works for RSS 2.0, but I would hope for something a little more structured, clever, and forward looking for RSS 1.0.

modtrackback

A proposed modtrackback exists (in a limbo state of not having hit the proposed modules list) and provides for both ping which is the comment service discussed above, plus about which functions more like annotation. This works (in fact ben used to suport it), and is very simple. But isn’t very future proof.
  • What do we do when we want to advertise more then one service?
  • Or a service that works differently then TrackBack?

modlink

Kevin Burton’s modlink seems like it has a lot of promise with its ability to provide richly described. arbitrary linking. It comes with 6 standard types of links, and is extensible through providing new relationship URIs (like namespaces). More good info in his design decisions. Maybe a comment service could be published with relationship “http://purl.org/rss/1.0/modules/link/#comment”, e.g.

<l:link l:rel="http://purl.org/rss/1.0/modules/link/#comment" l:type="?????" l:title="urn:trackback-comment" l:lang="en" rdf:resource="http://laughingmeme.org/mt-tb.cgi/292"/>

As you can see, I’m not clear on what should go in the type attribute.

Extensible

I think this is a cool option, and very extensible. It also isn’t limited to REST based services (though I think REST is a more natural fit for the resources already available to an RSS aggregator) as the modlink ships with a standard “service” relationship for pointing at WSDL descriptors (and presumably RSD descriptors as well).

Problem with modlink

Unfortunately it would require a rather major (though doable) upgrade of the existing RSS parsers at least the ones I work on: XML::RSS will now (0.984) look in rdf:resource attributes if it is explicitly told to, but discards the rest of the attributes, MagpieRSS ignores attributes entirely. Mark Nottingham’s RSS.py has limited attribute support (and like XML::RSS should be hard to add more support), I haven’t played Mark Pilgrim’s rssparser but I’m betting against attribute support. What are the other standard RSS parser?

A services module

A last option is to write a new module proposal specifically for carrying service information, it could be simpler, and not make extensive use of attributes. It would probably be easier to parse with current toolkits, and more likely to be adopted into RSS 2.0. However, I’m having trouble getting motivated to design something new in the face of the elegant work Kevin has already done. Maybe inspiration will hit soon.

Tagged: Uncategorized , , ,

RSSifying the Mailing List

November 25th, 2002

Lattice asked me about resources for representing a mailing list as an RSS feed. Particularly, he was wanting to put together an alternative(post-email) interface to a Sympa mailing list, with an RSS aggregator for reading the list traffic, and the Sympa web interface for posting. I didn’t have any suggestions for Sympa, but noted that the script mmrss turns Mailman lists into RSS feeds.

A Thought

This inspired me to think about tackling the chronic lack of mail list archiving soltuions as multi-step problem. Perhaps if we could get the mailing lists into a suitable inbetween format, with all the assumptions exposed, and codified into XML, then perhaps the archivers simply focus on giving a decent presentation of the complex interactions of a mailing list. And having that format be compatible with the latest crop of desktop clients would allow people to build new and exciting ways of interacting with the lists.

So I compiled an overview of the problems, the challeges, and work that has gone before on building an RSS threading standard.

A Quick Review of the State of the Archiver

MHonarc is something of a standard, but I can’t standard the archives it generates, it should be possible to generate something attractive and usable, but I’m still waiting. Current otherings run from the pedestrian Mail Archive to byzantine Sympa style)

Mailman’s archives ( pipermail) are pleasingly straightforward and clean, however their threading algorithm is a little weak, the archives are fragile with slight changes to the mbox changing URLs, and rebuilding the archives for very active, old lists can be incredibly slow. This and pipermail has no clue what to do with attachments.

Zest is an intriguing alternative I’ve mentioned before, however people I’ve shown it to find it confusing, and while I think they could learn, I hesitate to recommend it as a drop in replacement for MHonarc/Pipermail.

The Problem with MMRSS

Unfortunately mmrss doesn’t solve our problems. It scrapes the existing Mailman archives, and creates very simple RSS 0.91. Because RSS 0.91 isn’t very flexible, there is no way to include most of the interesting information, including the basic meta-data like creator (From:), and date sent (Date:) as well as more email specific info (like attachments, user agent, and message id), and the messages threading information (In-Reply-To:)

This means that while MMRSS could be useful for watching a list (ie. getting notified when it updates) it does not provide a meaningful alternative to reading the list.(this could partially be due to the script started life to generate RSS feeds for FoRK, an email based proto-blog with an interesting role in the history of RSS/Email convergence)

What would we want?

On the <channel> definition we’ll want the usual suspect, including:

  • the name of the mailing list in the <title>
  • the URL of either the lists webpage if it has one (as all Mailman and Sympa lists do), or the link to the web accessible archives (if for some reason you are using something else, and haven’t setup a webpage)
  • the date of the most recent message to the list in <dc:date>
  • and as much other meta-data as possible including description, and language.
And we’ll want:
  • the list, and list admin addresses, perhaps in <dc:creator> and <dc:publisher>? or would that be an inappropriate re-use?

On a particular <item> we’ll want:

  • the URL to the web archived email
  • the subject of the email in the <title>
  • the contents of the From: field in <dc:creator> (or one of the sha1/foaf based email obscuring technologies being discussed on rss-dev)
  • the date the email was sent (or received by the mailing list software) in <dc:date>
  • the full content of the message in the <description> (if you’re serious about providing an alternative interface to the list)

Ideally we would also have:

  • information about the messages membership in a thread (see below)
  • links to web archives of any attachments that might have been included in the email
  • a link (or mail address) to reply to this message particular message

There might also be reasons to include:

  • Message-ID (perhaps for constructing replies)
  • User-Agent
  • Spam Status (or similar Spam flagging header)
  • CC information
Some of this definitely can’t be shoe horned into the 3 standard RSS 1.0 modules (Dublin Core, Syndication, and Content), and demand extensions, but perhaps a proposed module (or modules) will do.

Examining the Prior Art in Threading

Not surprisingly, displaying email as RSS, displaying mailing lists as RDF, and building interchange formats for threaded discussion in RSS have all been discussed before. No one change up with exactly the same feature set (surprise!) because everyone had slightly different conceptions of the problem.

The first important insight when considering an implementation (and examining prior art) is to realize that much of what is important to representing a mailing list is important to representing any form of threaded discussion, like the comments from a message board or blog, or the posts from a newsgroup.

There is the proposed RSS threading module, that adds one tag: <child>. Which is nice and simple, but also kind of awkward as email tends to be linked into threads by referring to a parent.(In-Reply-To:)

ThreadML was an interesting initiative of Steve Yost of Quicktopic, partially created in a response to this article on Joho. It was a standard composed of RSS 1.0, modcontent, modthreading to represent parent-to-child, and mod_annotation to represent child-to-parent relations. There was a very active quicktopic discussing ThreadML for a while, but it seems to have gone quiet. An example Quicktopic RSS feed to see how it might have looked.

Discussion of creating an RSS feed for the W3C’s mailing lists, prompted the proposal of a mod_email which might be useful, but doesn’t seem to focus on the commonality between different mediums enough for me.

The PHP mailing lists are available as RSS feeds, unfortunately here again, they aren’t very useful RSS feeds, with the From information stuck into a <mailto:> tag in the <description> tag. Odd.

Yahoogroups (formerly eGroups) provides RSS feeds for the lists (e.g. RSS-DEV[18] they host, see the original announcement on FoRK)

Mail-archive.com makes a simple RSS 0.9 feed available for each list, for example mod_perl’s RSS

Lastly, the Thread Description Language is an interesting attempt to build a rich RDF syntax for talking about all sorts of different threads. Some of its concepts like agreesWith and disagreesWith would be very cool to add to a RSS/RDF feed based on Zest and its inline mark up.

Conclusion, and Concerns

All of that is very interesting, but I don’t feel like any of the above directly maps to the features I mentioned above, it might be possible to assemble something out of the pieces, but a few items (like the url/mailto to respond to a post) is totally missing from any of this.

It might be worth looking at an email<->NTTP gateway to see what tricks they play, and what they consider necessary.

One problem with marking up email in XML is that it makes it very very easy for spammers to identify email addresses. There is a thread on RSS-DEV about ways to combat this. It seems like it should be possible with the collusion of the list archiver to send out “privatized” emails, where contact information is replaced with some sort of smart URI to confuse harvesters without interrupting communication. Yahoogroups kind of does this.

It would cut down on the complexity immensely to just skip the whole threading thing, but I think threading is an important feature for facilitating a culture of discussion, often discourages in spaces that use “linear threading” (like the traditional display of an email client’s inbox) Might be suitable for a version 1.0

Yet another approach would be to get away from the concept of an individual post, and syndicate conversations (threads). This would map very well to Zest’s concept of threads, but would work pretty well for normal archives and threads as well. The one trick would be figuring out distribute threads in such as way as to be useful with the most recent post readily accessible.

Related Posts:

Trying to stay out of the RSS wars

September 17th, 2002

In the immortal words of Mark Pilgrim, “He is, of course, kidding. At least I hope he’s kidding.”

He was referring to Aaron’s RSS 3.0 spec, but I’m referring to Mark’s <blink> suggestion. Small, domain specific changes, are why we like namespaces. Have you considered perhaps using a <link> element with DCMI Qualifier, Type=”Quirky”? (Type, used to categorize the nature or genre of the content of the resource.) And how you’ll write documentation for those of us born after the golden age of vinyl?

And do you pronounce it “blngk” or “b-lngk”?

I can’t understand how some people (this is not directed at Mark) can on one hand argue for the simplification and clarification of RSS, and on the other hand support a format where tags get added willy-nilly. But if you are going to support such a format, can it at least include the stipulation, “If people want to add an element to RSS, then just send it to [Aaron] and [he’ll] add it to [the] list of all elements in use.”

….but if I was going to get involved, I might point out that Morbus need to work on his sense of humor, I’ve thought this ever since he joked about shooting his girlfriend, while Dave has serious anger issues, but Bill has always been great the few times I’ve corresponded with him.

Tagged: Uncategorized , ,

Fighting with Gallery

May 13th, 2002

Install PHP CGI, or Turning Mole Hills into Mountains

I’ve always used PHP compiled into Apache, so PHP CGI was new to me. And for the life of me, I couldn’t figure out how the hell to begin. I searched the documentation, typed ./configure --help dozens of time, finally opening up the configure script and going through it line by line. I simply could not find a parameter, argument, option, environement variable, or config file, that said, “Build PHP as a CGI”. Which brings us to the problem of, “How does one document a negative?”

PHP apparently expects to be compiled as a CGI. The fact that no one does this, hasn’t changed that basic assumption. So just compile the damn thing sans --with-apache, and you’ve got a PHP CGI executable ready to go. Hmmm, we’re already 30 minutes over budget for installing Gallery, and we haven’t touched it.

Compiling, or Mop the Augean Stables

Well we’ve got configure in the bag, and the freshly downloaded 4.2.0 is happily compiling, but wait! Now make is having problems. Something about EX_OK being used before its defined. Screw it! We’ll switch back to 4.1.1 which is kicking around the src directory, and try to build that instead. Ok that worked.

Putter around trying to figure out how to enable PHP-CGI.

  • #!/usr/local/bin/php? No that doesn’t work.
  • AddHandler cgi-script .php? Hmmm, nothing.
  • AddHandler application/cgi-php, Action applicaton/cgi-php /usr/local/bin/php? 404, /usr/local/bin/php/~kellan/test.php not found.
Eventually settled on
AddHandler application/cgi-php Action applicaton/cgi-php /cgi-bin/php.cgi

If that was somewhere in the PHP docs (or in the Programming PHP book which I grabbed from Safari, I couldn’t find it.

Finally, we’re getting somewhere! Way, way, way past our 20 minutes of alloted time, but at this point, its Gallery or me.

And Gallery is still winning. A 505, and a quick check to errorlog, and for some reason PHP is segfaulting. I don’t why, something about the alignment of the moons. I futz with it, and decided that maybe I need to go back and check out 4.2.0.

Morefutzing, some cursing, a thrown pillow or two, and we’ve got the lastest version of libc6-dev, and the sysexits.h contained therein, and one brand spanking new php.cgi, version 4.2.0.

Debugging Gallery’s config wizard, or The Evil that is PHP.

All of Gallery’s wonderful configuration wizards are great…until they break. Kind of like Windows, but at least I had the source. So lets look behind the curtain….

The first time I saw it, I was blown away by Gallery’s wizard. This time I realized that PHP never stops being PHP, maybe it seems elegant, simple and intuitive, but under the covers its still an ugly hack.

  • Gallery was checking my .htaccess file by trying to fiddle a php.ini value, to prepend a file containing a global variable. Ingenous! I didn’t think you could test the .htaccess stuff without deep hooks into Apache’s core. I went looking for some mechanism for querying Apache, or parsing config files, or something. It took me forever until I realized that that peculiar little autoprependfile phpvalueok.php was performing the magic.

    But what if you’re running in CGI mode, and phpvalue isn’t supported? (if you’re me, you comment out that check, and move on)

  • Gallery was checking to see if modrewrite was working, very logically, by rewriting a url, and seeing if it worked. In and of itself, not a bad idea. (now that we had determined it didn’t have a magic way to ask Apache about this stuff)

    But when Gallery started turning up false negatives, claiming my modrewrite didn’t work I was stumped. Much, much, too much later, and after much pain, and gnashing of teeth, I realised that Gallery was testing for the existence of a variable “initmodrewrite” that was never set, and therefore must be expecting PHP to magic it into existence. (and here we note the difference between C which refused to compile without knowing what EXOK was, and PHP who sailed along blindly without a care in the world)

    Well it is a reasonable assumption in PHP for variables to magically appear, and a trip to php.ini, to turn on registerglobals; an option that magicks into existence variables containing anything passed in the query string.(say if you’re script uses modrewrite to tack initmodrewrite=1 on to the end of your url)

    • Gallery’s FAQ does actually have the question, “The setup page tells me that modrewrite is not installed. Since modrewrite is optional, how do I configure this option?” But the answer is less then helpful.
    • A Google search show that the term registerglobals is only used once on the Gallery website, in the notes for the pre-release version of Gallery 1.3

Tagged: Uncategorized , ,