Blog posts tagged "syndication"

Yelp, User Contribued Content and Feed Design

December 23rd, 2005

Yelp gets feeds just right. I’m not sure I’ve ever said that before about anybody.

They’ve got:

  • a feed of my reviews allows me to re-purpose the content I create (No RSS, No Content Creation],
  • a feed of my (as yet non-existent) network’s reviews
  • feeds contain the full content of my review (again my content, I created it, give it to me)
  • rich feeds; they use the geo namespace to embed lat/long
  • Both RSS and Atom 1.0 feeds

Very nice.

A minor nit. These are feeds of reviews, not feeds of places, so it makes sense that the rating is included in the title of the entry, but I’d still like to see the rating and location’s name presented in separate structured elements as well (say for example I want to syndicate my reviews locally, and display a graphic of the stars)

Also per category feeds might be useful.

Atom and Wiki Driven Testing

December 20th, 2005

Its been a long standing todo to port Mark’s FeedParser tests to work against Magpie, possibly with an intermediate representation to allow cross-language testing. (has any work been down on capturing unit tests/acceptance tests in XML?) Sam’s approach hilights Ruby-the-language’s awesome flexibility (I’d been playing with something similar for the parser we wrote for Odeo), but doesn’t map to PHP/Magpie very well.

Phil kicked off a new round of testing for Atom 1.0, the results of which are now captured in the Atom wiki. (not to mention a few gentle nudges on Magpie’s lack of 1.0 compliance.)

All of which got me thinking, it would be exceptionally cool if someone made the FeedParser’s tests available on the Atom wiki using Ward’s FIT concept in a documented, reportable fashion.

Any takers?

Tagged: Uncategorized , , , , , ,

Excerpted from Developing Feeds with RSS and Atom

June 5th, 2005

Ben Hammersley on Magpie

The most popular parser in PHP, and arguably the most popular in use on the Web right now, is Kellan Elliott-McCrea’s MagpieRSS. As I write this, it stands at version 0.7, a low number indicative of modesty rather than product immaturity. MagpieRSS is a very refined product indeed.

Thanks Ben!

Tagged: Uncategorized , , , , ,

Private Feeds, and Atom as Open Pipe

January 25th, 2005

Tim Bray has a short entry on Private Syndication this morning, which I by and large agree with. Personal feeds make sense; they make sense from the perspective of business workflow , content model, and a scalability.

In order to make it happen we really need an updated list of what aggregator support which key features. HTTPAuth (at least Basic, if not Digest) and SSL are the fundamental building blocks of private feeds, with the addition that the major aggregator services need to be aware that content could be negoiated at auth time. The only list I know of is from July 2003.

I was puzzled and pleased to see his closing line:

One detail: I think that for this kind of content-critical, all-business feed, Atom is a more attractive choice than any of the RSS flavors.

Which is odd, because all of the time I’ve spent with the Atom community (which was admittedly still called Pie/Echo at the time) was focused on blogs to the exclusion of all else, and all arguments I made about the potential of pushing other forms of data over this new format were ignored/squelched.

For example, an Atom feed, requires every entry to have an author element, which is defined as a Person contruct. Who is the “person” in an Atom feed generated by your “bank account, credit card, or stock portfolio”?

Additionally perhaps the language of the spec needs to be updated with some namespace best practices, and some non-blog examples?

Tagged: Uncategorized , , , , ,

RSS 1.1

January 19th, 2005

Looks like a couple of folks active in the rss-dev working group have taken Ian’s excellent RSS issues document to create RSS 1.1. This is a great idea, and one a long time coming.

The execution leaves me kind of cold. A list of changes from RSS 1.0 is given, however it’s written in some dialect (maybe high semweb geek?) that I don’t speak, so the disambiguations are ambiguous.
The major changes I can see are:

  • rdf:Seq has been removed in exchange for a more liberal of sprinkling of opaque RDF attributes. I’ve never figured out why rdf:Seq bothers people.
  • channel is now Channel
  • Channel is now the root element. Ick, that doesn’t match my internal modeling of what a channel is. But I’ll admit that is a personal thing.
  • through out the document the use of the rdf:about attribute on items is repeatable discouraged, which is unfortunate as this value acts as the defacto guid on RSS 1.0 feeds. Brent agrees. Already changed, apparently.
  • only allowed charsets are UTF-8, UTF-16, and UTF-32. Umm, I have better tools for dealing Shift_JIS then UTF-32, not to mention the relative dominance of ISO-8859-1

What didn’t change

  • Not addressed one of the single ugliest outstanding issues in all RSS version, Markup In Core Elements. One of Atom’s clear advantages.
  • language is still confusing and inaccessible to non-RDF hackers
  • no standard for providing a unique identifier (like RSS 2.0’s guid and Atom’s id elements), and a distinct discouragement of the RSS 1.0 semi-equivalent.

I’m torn whether these changes are too drastic to justify a single sub-version increment (I’d expect all tools that work with RSS 1.0 to work with RSS 1.1, which won’t be true), or whether they’re too minor to justify adding yet another version of RSS into the already crowded, and confusing space.

I think Sean and Chris are doing good works, a clean up of the RSS 1.0 has been a long time coming. But my (very) quick glance at what they have says they haven’t nailed it here. Good discussion going on on rss-dev.

Tagged: Uncategorized , ,

A Few More Thoughts on Netflix Friends

January 15th, 2005

No RSS, No Content Creation

The “Add your $0.02” feature is interesting. More micro-content creation. If I’m going to be blogging, even if it’s just loose change, I want an RSS feed of my, and my friends 2-cent comments. Something that I can stitch together with my the rest of the micro-content, from del.icio.us commentary, to 43 Things entries, Amazon book reviews, and my coffee shop reviews from WifiMug.

Seeing Stars

I never know how to use a 5-star rating system, but the friends system exposes Netflix’s underlying assumptions about what the stars mean, 3-stars is “I like it”. 2-stars and down is a negative rating, 3-stars and up is a positive rating.

WordPress RSS Aggregator

July 18th, 2004

I’ve been playing with the idea of a devblog for a while, a place which collates the various feeds associated with a project into a simple location for easy viewing, and commenting it. Fisheye, and CIA are full blown attempts at solving this very problem for large open source projects. This morning, unable to get back to sleep, seemed simpler and faster to write something new, then finish reading the install instructions.

I present my quick and dirty RSS-to-Wordpress aggregator. There are a number of tools that allow you to display an RSS feed on your WordPress blog, and several that allow you to import an RSS into your blog, but I didn’t find any that worked as aggregators, periodically polling a feed, and creating new posts from the items found within. So that is what I built.

Some features:

  • dc:date becomes the postdate, dc:creator (or dc:contributor) becomes the postauthor, title maps to title, description to post_content, dc:subject to category.
  • a postmeta variable is used to track which RSS items have been seen before, and only insert them once.
  • you can assign 1 or more categories to be automatically attached to each post per feed. (e.g. a ‘CVS Commits’ category)
  • new authors, and new categories are auto-vivified if they don’t exist.

Limitations:

  • I wrote this to parse our internal cvs2rss feeds, and our Twiki feed. Those both produce RSS 1.0 feeds that conform to my particular RSS aesthetics. As such the script doesn’t try to hard to support other versions of RSS.
  • Links are treated as perma-links, and unique identifiers.
  • Ignores content:encoded
  • Only handles inserts, not updates
  • I wrote this with nightly build 2004-7-14, your milage may vary.

None of these would be hard to fix, but this was good enough for my needs this morning, in the short period of time I wanted to spend on it.

Todo

Some other features that would be nice to:
  • Store config in the database and add WP UI for managing aggregated feeds (should be doable with option groups?)
  • Support adding categories by name instead of id. (and auto-vivify categories)
Uses Magpie (surprise!), so you get to leverage support for fetching private RSS feeds. (which I’d recommend for serving up internal RSS feeds) Expanding on the devlog theme, you might want to include the RSS from your project management tool, and bug tracker. (we aren’t currently using RSS enable tools for this, but if, for example, you were using TasksProp or Basecamp, those would be good feed to include.)

A Few Observations on WordPress

  • Doesn’t support PHP5 yet. I’m not sure how pervasive the problem is, but it uses a modified version of ezSQL for its DB abstraction layer, which isn’t PHP5 compatible. Too bad they aren’t building on top of PEAR DB.
  • Excellent for whipping up an attractive, feature rich blog.
  • Faster then expected, really zippy in fact, at least without load.
  • Code is kind of a mess (or at least old school PHP). Very little OO, SQL, and HTML is scattered around, core use of global variables.
  • I’m not a fan of the “PHP is already a template argument”, but I understand why some people are.
  • Doesn’t feel as polished as MT, but is certainly more hackable. Reminds me of the Kwiki “every installation is a snowflake” goal. Interesting to see if this creates a surge of creativity, or just balkanization.
  • Option groups, and the postmeta table make it incredibly simple to add new features.
  • PHP5+SQLite support, and a one-click install could make WP the Kwiki of blogging tools.

update [2004/10/28]: In Boston, but on Seattle time, fill in some extra details. This quick hack is growing faqs, and might need to sprout a page of its own pretty soon. In the mean time there is a new version which uses a simple config file, and has expanded del.icio.us support.

FAQ

  1. How do I print the link of the original RSS item?
    
    <?php echo getpostmeta($id, 'wpaggrss_id', true) ?>
    
  2. Dates aren’t working, all my posts are from 1969. Currently wp-rss-agg only supports dates in dc:date field, however there is a feature in magpie-cvs that should make it simple to provide Atom and RSS 2.0 date compatibility as well.

update: FeedWordPress is an actively developed and maintained version of this script. Charles has taken it beyond my simple proof of concept, and it is almost certainly what you’re looking for.

Tagged: Uncategorized , , , , , , ,

New Atom Link Types?

May 27th, 2004

Arbitrary links which are self-describing as to their intent are incredibly cool, and I’m happy Atom offers them using the link constructs plus rel attribute. (I was also a fan of RSS 1.0 mod_link, but the only one apparently) However Atom has some potentially problematic, or at least confusing limitations.

Last week, when I was thinking about Google leveraging the link construct to syndicate Usenet threading, I was rebuffed by the Atom spec which claims:

The “rel” attribute indicates the type of relationship that the link represents. Link constructs MUST have a rel attribute, whose value MUST be a string, and MUST be one of the values enumerated in the Atom API specification http://bitworking.org/projects/atom/draft-gregorio-09.html.

Now Mark is using the ‘via’ type and pointing to LinkTagMeaning as the definite list of rel types.

The Question

So my question is, LinkTagMeaning is a wiki page, does this mean that the rel vocabulary for Atom is open for growth as long as it is documented on this page?

(an aside: The problem with the theory behind trackbacks – that the web is just one large conversation and therefore we don’t need to enable comments – is that I could have written this question as two sentences in the context of Mark’s blog, but its 3 paragraphs over here)

Tagged: Uncategorized , , , ,

Atom Feeds for Usenet

May 14th, 2004

Google Groups 2 (nee DejaNews) is in beta, and available for play.

Among the new features is revival of the option for users to create new groups, putting then squarely in competition with YahooGroups. (btw if you control both the mailing list and the mail account this opens up some cool possibilities, the most boring of which is keeping only a single copy of a message instead of one per subscribers [stuff we chatted about in the context of Riseup])

Now with Atom

However I think the coolest feature are the Atom feeds per group. Just a short 9 years since I’ve stopped compulsive reading rec.arts.books rab has 2 shiny new Atom feeds, recent posts, and recent threads.

An aside, I’ve always thought the argument of the form “there are more RSS 2.0 feeds then any other format so…” were specious, but some people are fond of them. Well with Usenet+Blogger my gut says the total number of Atom feeds is on track to pass RSS 2.0.

Whither the Atom Namespaces?

My critique of Atom is, was, and has always been that it was invented as a weblog syndication format. I brought this up during the initial design process, but the idea failed to gain traction. Its very cool to see Usenet to RSS, but its a shame that this distinct content with its own unique metadata is getting shoe horned into looking like blog posts.

Where are the custom Atom namespaces? (modules in RSS 1.0 parlance) I’ve noticed that as new Atom sources come online they seem to be shy about extending the core, and so some things which should not be forgotten, are lost.

Give me Threads

Google Groups as the definitive source of Usenet over Atom is in place to do some good for the world, and create a de facto standard for this space. As they move into competition with YahooGroups (a service which while popular, hasn’t changed much in the last 3-4 years, except to ad the interstatials, and whose own syndicated offerings suck) they’re going to be one of the largest providers of mailing lists. At the very least threading information would be nice. And if Atom can get it together and offer a decent, cross application threading module, I’ll take back all the nasty things I’ve said about it. (which really haven’t been that nasty) We tried with RSS 1.0 and the ThreadML initative, but it kept bogging down.

I tried to get this stuff into Atom from the start, but didn’t have the time/clout/cabal status to influence. If I was modelling it today I would have used the link element, with a new rel type. Also might be worth checking out the prior art in the space.

Also does Gmane provide RSS feeds? I a quick search this morning didn’t turn anything up. (talk about a candidate for Google acquisition, they wouldn’t even have to change the name)

(Mark agrees that YahooGroups has failed to innovate, and he should know.)

update: Proposal for threading in Atom

it is not catastrophic

February 11th, 2004

Adding Atom support to RSS-consuming applications should be a matter of hours, maybe a day or two of developer time. This whole thing may be annoying, but it is not catastrophic.Tom
I think it took about 2-3 hours to code and test Magpie’s (admittedly incomplete) Atom support, plus at least another hour to blog about it. So call it 4 hours total. (from 11pm to 3am)

As for the rest of it, we can only hope.

On a side note, Atom is clearly a failure as Ken MacLeod still doesn’t have a feed I can subscribe to.

Tagged: Uncategorized , , ,

Experimental Magpie Support for Atom

January 24th, 2004

You are officially entering wet cat territory.

Inspired by Scott’s patch, the million or so sites that only produce Atom, and a couple of requests, I hacked experimental support for parsing Atom into Magpie.

Taking a page from Mark’s Feed Parser, it should be relatively transparent to move between parsing an RSS feed, and parsing an Atom feed. Specifically

  • Atom feed elements and RSS channel elements are both accessible via $feed->channel[$elementname].
  • Atom link elements that point to an alternative html version (i.e. those with the attribute rel="alternative") are treated as being equivalent to RSS’s link elements and are accessible via $feed->channel['link'] and $item['link']
  • channel/description is mapped to channel/tagline and channel/tagline is mapped to channel/description
  • item/description is mapped to item/summary and item/summary is mapped to item/description

Namespaces and Atom’s item/content field

Magpie handles namespaces by adding an array to an item using the namespace prefix as the key. For example and item’s <dc:subject> (aka item/dc/subject) field is available at $item['dc']['subject']. This has never been ideal, but it is simple, from both the parser’s and the user’s perspective. This causes a small conflict between RSS’s item/content/encoded field and Atom’s item/content field. I’ve chosen to make Atom’s item/content field available at $item['atom
content']. If the content field is of type xml, I flatten it to string instead of making the parse tree available. (I don’t think anyone using Magpie wants the parse tree). Like I said, wet cat country. Also, item/content/encoded and item/atom_encoded are mapped to each other.

Nested Elements

Magpie has never handled elements nested more then one level deep. While this could have potentially been a problem while parsing RSS, no one has mentioned it yet. However Atom even at its simplest has a number of nested elements, so just ignoring them wasn’t going to work. Here is what I do, this:
  <author>
    <name>Mark Pilgrim</name>
    <url>http://diveintomark.org/</url>
    <email>f8dy@diveintomark.org</email>
  </author>

Becomes:
[authorname] => Mark Pilgrim
[authorurl] => http://diveintomark.org/
[author_email] => f8dy@diveintomark.org

Lastly there are two new methods $feed->isrss() and $rss->isatom() which return false when false, and return the version number of the feed when true (e.g. for Atom will likely return ‘0.3’, for RSS could return ‘1.0’, ‘2.0’, ‘0.91’, ‘0.93b71’, or a variety of other values)

Getting Started.

I think that is everything you need to know to get started playing. I’ll do a release complete with tarball once Sourceforge’s CVS servers are back online, in the meantime you can download rssparse.inc.with.atom, rename it rssparser.inc, and it should be a drop in replacement for your current rss_parser.inc. All the documentation at the beginning of the file is all of out of date, but the inline comments have been updated, and you have this blog entry. (as an alternative, you might want to look at using Aaron’s Atom to RSS stylesheets.)

Caveat

I tested against only two Atom feeds, Steve’s which I took to be representative of Blogger’s output, and Mark’s which I assume is an example of best practices per Atom 0.3. There was a enough variation between them that I don’t feel it was a horrible sampling. Also I only tested against an RSS 1.0 feed to make sure that RSS parsing hadn’t broken, but again, I’m feeling pretty good about it.

Next Steps

The code is still kind of hoary, and in need of a major refactoring. Also I’m not sure how happy I am with this whole solution, it is partially a proof of concept. So if your interested in parsing parsing Atom with PHP, or have thoughts on Magpie and Atom, take it for a spin, give me some feedback, and we’ll see where it goes.

Thoughts on Atom

I’m still not as excited about Atom as I am about RSS. It feels like a dead end format designed for one, and one thing only, blogs. I guess its a good idea to do one thing, and do it well, but I’m not sure I would have chosen blogs as my one thing to do well in life. Also little things like in channel the summary field is called tagline is just annoying, and reminescinent of some of RSS’s worse descision. The various modes, and types of fields make it hard to write a parser which is “correct” (as opposed to us writing RSAS parsers)

update: Magpierss-0.6a (alpha) is available for download. This release adds the above support for Atom, as well as the support borken webservers patch. This is not the fabled 0.6 release that was going to be a total rewrite of the parser for better namespace support, that is still vapor.

udpate: MagpieRSS 0.61 (not alpha) is out with Atom support.

Tagged: Uncategorized , , , ,

Not Validating RSS 1.0

January 19th, 2004

Was reminded tonight that the Feed Validator is not in the business of validating RSS 1.0, something I had commented on before, but had forgotten. Remembered tonight when I noticed that the otherwise excellent RSS feeds from del.icio.us are declaring the same resource multiple times per feed (i.e. multiple items have the same rdf:resource/rdf:about aka item_uri).

Which should cause trouble for practically no one (XML::RSS, and Magpie are both fine with it just to kick out two data points), and it is understandable to produce such a feed as many toolkits (earlier versions of XML::RSS, and the current version of RSSWriter for example) don’t provide a syntax for giving an item a distinct link and item_uri. But the Feed Validator should at least mention it.

Tagged: Uncategorized , ,

New Additions to Your Personal Universe

December 12th, 2003

My light cone contains 27 stars. Pi-3 Orionis will be reached in in 4 months. – Matt Web explores the outer limits of causality.
Can I get an RSS feed of celestial objects in my light cone please?

update: Yes, yes I can. Thank you Matt!

He has also made the source available, but I’m hesitant to peek, as it blows off all the fairy dust. (one of the undiscussed downsides of open source, takes the magic out of computers)

Tagged: Uncategorized , , ,

Reviews, Dates, and Microcontent

November 6th, 2003

Slowly digging myself out of the back log from a week offline (expect lots of MLPs).

One thing that happened last week was the release of two very rich new modules for RSS (2.0) that are being batted around the Net this week (both via les).

The Reviews (RVW) namespace “is intended to allow machine-readable reviews to be integrated into an RSS feed, thus allowing reviews to be automatically compiled from distributed sources”.

Over at Deanspace (which is doing a surprising amount of interesting hacking) the Datebook Schema Embedded in RSS (DBRSS) is embedding an vocabulary for discussing very complex information about events including recurring events, location info, event planning (tracks and costs), scheduling (rsvp) requirements, and convener info.

It is awesome to see people really pushing the boundaries of syndication, thinking about new and creative applications that can be enabled by sharing more data in more structured ways. Which is why I felt a little odd at the very similar disquiet both these specs caused in me. As I was struggling to articulate it, I found Bitsko’s Is a Feed the right place for your Data? which sum up the unease quite nicely.

Review data has permanence, it has linkability, it has searchability, it has reusability — why is it locked in a syndication feed for use pretty much only by syndication clients?
(this is less true of events, and perhaps DBRSS is less covered by this critique)

Its a HyperTextual World

Bitsko proposes “freeing” the review information by giving it its own url, and syndicating a link to it. I think this is brilliant, the information at the end of the URLs is a real untapped source of descriptive power, which is why I loved Kevin’s proposed mod_link. (though no one else seemed to) Bitsko demonstrates how if you moved the structured data (e.g. a machine readable review) to its own URL, you could link to it transparently from an HTML document, or any of the various syndication formats, well worth a read.

Don’t Forget the People

A while back I started writing an article I hoped to pitch to XML.com on designing RSS modules. I never finished it (and XML.com published so many RSS articles, the market seemed played out) but the central idea of the article (in retrospect) was about striking an aesthetic balance in namespaces between readability, and structure. A good rule of thumb is:
Include just enough information in a feed so that an item could be displayed in a meaningful way without having to fetch the remote resource.

There are a few reasons for this

  • RSS has proven that human readable formats get faster uptake; design for the “View Source” style of learning.
  • Fetching and parsing a remote resource is hard for beginners to do well, like all the neophyte PHP hackers in the world who are just wanting to do something quick.
  • Strikes a balance between current, and future usage patterns

Externalizing Reviews

Taking the Reviews namespace as an example, as I imagine myself trying to use it, I think I would want to at least know the title of book as well as the title of the review. This changes the RSS item to read “I am syndicating a review of this book, more information at this URL” instead of just “I am syndicating a review at this URL“.

I support Bitsko’s idea of giving microcontent a home of its own, but lets not sap all of the semantic meaning out of the feeds while we are at it.

Externalizing DateBook RSS

Similarly the bulk of the DateBook schema could be moved an external resource, and the feed could syndication a link to this resource , and the most basic of event information (which is the idea behind modevent). DBRSS even has the advantage that unlike reviews, there are already a number of calendaring/scheduling formats available, and there is no need to invent a new one. (I’m assuming DateBook schema is something new, the name makes me think of an attempt to XMLize the core Palm calendar, but the fields don’t match at all)

Depending on persuasion you have a choice between iCalendar, RDF Calendar, or xCal.

Transient Metadata

You’ll notice (or at least, I notice) that this is a different approach then what I took with my rough sketch of mod
weather (an admittedly much simpler namespace), where I packed all of the information into the feed.

The difference is current weather conditions (and even forecasts) are about the most transient information imaginable. They are also laden with some of the worst, most obscure formats to ever reach wide circulation. There is no added benefit to giving the current weather conditions for this instant in time a home of its own.

More on DBRSS

I think the story on DBRSS is less cut and dry then RVW, I’ve certainly felt the tug of a richer event syndication format myself, perhaps one less unencumbered by Calsch’s years of work. A couple of quick thoughts that came up looking at it.
  • Durations instead of endtimes is a seductive choice, but I’ve found that if you’re storing your events in a SQL datastore, endtimes are much more useful.
  • Tracks I don’t understand, and seem a little off to me. It seems like an attempt to cram a calendar into an event.
  • Wouldn’t it be better to use a geo vocabulary to describe the location, rather then larding it into your calendaring one?

Tagged: Uncategorized , , , , ,

Once more into the Breach: Calendars, Events, and RSS

September 22nd, 2003

The launch of Upcoming.org seems to have rekindled some interest in calendaring standards, stoked by a post from Ray Ozzie, and the Calendar Fiasco, by Jon Udell. (also see eric’s collection of upcoming.org links)

Read the rest of this entry »